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REMARKS 

Attached hereto is a marked-up version of the changes made to the speciflcation and 
claims by the current amendment. The attached page is captioned ^^ VERSION WITH 
MARKINGS TO SHOW CHANGES MADE. ^^ 

1. Rejection of Claims 27 and 28 Under 35 U.S^C. §1 12, second paragraph 

The Examiner rejected Claims 27 and 28 under 35 U.S.C. § 112, second paragraph, for 
alleged indefmiteness. In order to expedite prosecution, Claims 27 and 28 have been canceled. 
Hence, this issue is moot. 

2. Rejection of Claims 19, 20, and 22 Under 35 U.S.C. §1 12, first paragraph, enablement 
New Claims 33, 34, and 35 correspond to canceled Claims 19, 20, and 22. The Examiner 

maintained the rejections of Claims 19, 20, and 22 under 35 U.S.C. § 1 12 first paragraph, stating 
that "[t]he specification does not enable any person skilled in the art to which it pertains, or with 
which it is most nearly connected, to make and use the invention commensurate in scope with 
these claims." (Office Action, page 4.) In particular, it was asserted that the specification does 
not describe how to use the polynucleotide variants recited by the claims. 

However, Applicants note that the naturally occurring polynucleotide variants and the 
polynucleotides encoding the naturally occurring polypeptide variants are expressed genes. 
Because the claimed polynucleotide sequences are expressed and encode expressed polypeptides, 
a person of ordinary skill in the art would know how to use the claimed polynucleotide sequences 
~ without any guesswork - in toxicology testing, drug development, and disease diagnosis 
regardless of the activity of the encoded polypeptide. The claimed invention could be used, for 
example, in a toxicology test to determine whether a drug or toxin causes any change in the 
expression of kinase-related proteins. Similarly, the claimed invention could be used to 
determine whether a specific medical condition, such as cancer, affects the expression of kinase- 
related proteins and, perhaps in conjunction with other information, serve as a marker for or to 
assess the stage of a particular disease or condition. (See specification at, e.g., page 14, line 24 
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through page 15, line 13; and page 22, line 27 through page 23, hne 8.) 

In fact, the claimed polynucleotide sequences could be used in toxicology testing and 
diagnosis without any knowledge (although this is not the case here) of the function of the 
proteins they encode: they could serve, for example, as markers of a toxic response, or, 
alternatively, if levels of the claimed polynucleotides remain unchanged during a toxic response, 
as a control in toxicology testing. Diagnosis of disease (or fingerprinting using expression 
profiles) can be achieved using arrays of numerous identifiable, expressed DNA sequences, or by 
two-dimensional gel analysis of the expressed proteins themselves, notwithstanding lack of any 
knowledge of the specific functions of the proteins they encode. 

These well-known and well-established uses, that are also disclosed in the Specification 
at, e.g, page 14, line 24 through page 15, line 13; page 22, line 27 through page 23, line 8; and 
page 30, line 26 through page 31, line 21, are outlined below. In recent years, scientists have 
developed important techniques for toxicology testing, drug development, and disease diagnosis. 
Many of these techniques rely on expression profiling, in which the expression of numerous 
genes is compared in two or more samples. Genes or gene fragments known to be expressed are 
tools essential to any technology that uses expression profiling. Likewise, proteome expression 
profiling techniques have been developed in which the expression of numerous polypeptides is 
compared in two or more samples. Polypeptides or polypeptide fragments known to be 
expressed are tools essential to any technology that uses proteome expression profiling. See, 
e.g., Sandra Steiner and N. Leigh Anderson, Expression profiling in toxicologv — potentials and 
limitations . Toxicology Letters 1 12-13:467 (2000) (Reference No.l). 

The technologies made possible by expression profiling and the DNA and polypeptide 
tools upon which they rely are now well-established. The technical literature recognizes not only 
the prevalence of these technologies, but also their unprecedented advantages in drug 
development, testing and safety assessment. One of these techniques is toxicology testing, used 
in both drug development and safety assessment. Toxicology testing is now standard practice in 
the pharmaceutical industry. See, e.g., John C. Rockett, et al.. Differential gene expression in 
drug metabolism and toxicologv: pracficalities, problems, and potential , Xenobiotica 29:655-691 
(1999) (Reference No. 2): 

Knowledge of toxin-dependent regulation in target tissues is not solely an 
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academic pursuit as much interest has been generated in the pharmaceutical 
industry to harness this technology in the early identification of toxic drug 
candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. (Reference No. 2, page 656.) 

To the same effect are several other scientific publications, including Emile F. Nuwaysir, et al., 

Microarravs and Toxicology: The Advent of Toxicogcnomics , Molecular Carcinogenesis 24:153- 

159 (1999) (Reference No. 3); Sandra Steiner and N. Leigh Anderson, supra. 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 

incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human 

ToxChip comprising 2089 human clones, which were selected 

... for their well-documented involvement in basic cellular processes as well as 
their responses to different types of toxic insult. Included on this list are DNA 
replication and repair genes, apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, estrogenic compounds, and 
oxidant stress. Some of the other categories of genes include transcription factors, 
oncogenes, tumor suppressor genes, cyclins, kinases, phosphatases, cell adhesion 
and motility genes, and homeobox genes. Also included in this group are 84 
housekeeping genes, whose hybridization intensity is averaged and used for signal 
normalization of the other genes on the chip. (Reference No. 3, page 156, 
emphasis added.) 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of special 
interest in making a human toxicology microarray). 

The more genes that are available for use in toxicology testing, the more powerful the 
technique. "Arrays are at their most powerful when they contain the entire genome of the species 
they are being used to study." John C. Rockett and David J. Dix, Application of DNA Arravs to 
Toxicology , Environ. Health Perspec. 107:681-685 (1999). (Reference No. 4, see page 683.) 
Control genes are carefully selected for their stability across a large set of array experiments in 
order to best study the effect of toxicological compounds. See attached email from the primary 
investigator on the Nuwaysir paper, Dr. Cynthia Afshari, to an Incyte employee, dated July 3, 
2000, as well as the original message to which she was responding (Reference No. 5), indicating 
that even the expression of carefully selected control genes can be altered. Thus, there is no 
expressed gene which is irrelevant to screening for toxicological effects, and all expressed genes 
have a utility for toxicological screening. This is true for both polynucleotides and polypeptides 
encoded by them. 
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There are numerous additional uses for the information made possible by expression 
profiling. Expression profiling is used to identify drug targets and characterize disease. See 
Rockett et al., supra. It also is used in tissue profiling, developmental biology, disease staging, 
etc. There is simply no doubt that the sequences of expressed human genes all have practical, 
substantial and credible real-world uses, at the very least for expression profiling. 

Expression profiUng technology is also used to identify drug targets and analyze disease 
at the molecular level, thus accelerating the drug development process. For example, expression 
profiling is useful for the elucidation of biochemical pathways, each pathway comprising a 
multitude of component polypeptides and thus providing a pool of potential drug targets. In this 
manner, expression profiling leads to the optimization of drug target identification and a 
comprehensive understanding of disease etiology and progression. 

There is simply no doubt that the sequences of expressed human polynucleotides and 

polypeptides all have practical, substantial and credible real-world utilities, at the very least for 

biochemical pathway elucidation, drug target identification, and assessment of toxicity and 

treatment efficacy in the drug development process. Sandra Steiner and N. Leigh Anderson, 

supra, have elaborated on this topic as follows: 

The rapid progress in genomics and proteomics technologies creates a 
unique opportunity to dramatically improve the predictive power of safety 
assessment and to accelerate the drug development process. Application of gene 
and protein expression profiling promises to improve lead selection, resulting in 
the development of drug candidates with higher efficacy and lower toxicity. The 
identification of biologically relevant surrogate markers correlated with treatment 
efficacy and safety bears a great potential to optimize the monitoring of pre- 
clinical and clinical trials. (Reference No. 1, page 470.) 

In fact, the potential benefit to the public, in terms of lives saved and reduced health care 

costs, are enormous. Recent developments provide evidence that the benefits of this information 

are already beginning to manifest themselves. Examples include the following: 

*> In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
gene, and chromosomal mapping location, to identify the key gene associated with 
Tangier disease. This discovery took place over a matter of only a few weeks, due 
to the power of these new genomics technologies. The discovery received an 
award from the American Heart Association as one of the top 10 discoveries 
associated with heart disease research in 1999. 
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In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte' s genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the number of drugs that may be developed and their cost are obvious. 



° In a February 10, 2000, article in the Wall Street Journal, one Incyte customer 

stated that over 50 percent of the drug targets in its current pipeline were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the number of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
development of new drugs. 

For at least the above reasons, Applicants respectfully request the Examiner to withdraw 
this rejection as it may apply to new Claims 33, 34, and 35. 



3. Rejection of Claims 19, 20. and 22 Under 35 U.S.C. §1 12, first paragraph, written 
description 

New Claims 33, 34, and 35 correspond to canceled Claims 19, 20, and 22. The Examiner 
maintained the rejection of Claims 19, 20, and 22 under U.S.C. § 1 12 first paragraph, "as 
containing subject matter which was not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventor(s), at the time the 
application was filed, had possession of the claimed invention." (Office Action, page 5.) In 
particular, the Office Action asserts that the Specification does not provide adequate written 
description of the polynucleotide "variants" recited by the claims. This rejection is respectfully 
traversed. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 

1 12, first paragraph, are well established by case law. 

... the applicant must also convey with reasonable clarity to those skilled 
in the art that, as of the filing date sought, he or she was in possession of the 
invention. The invention is, for purposes of the "written description" inquiry, 
whatever is now claimed, Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1117 
(Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for 
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Examination of Patent Applications Under the 35 U.S.C. Sec. 1 12, para. 1", published January 5, 

2001, which provide that : 

An applicant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics'^^ which provide evidence 
that applicant was in possession of the claimed invention,'*^ i.e., complete or 
partial structure, other physical and/or chemical properties, functional 
characteristics when coupled with a known or disclosed correlation between 
function and structure, or some combination of such characteristics.'*^ What is 
conventional or well known to one of ordinary skill in the art need not be 
disclosed in detail."*^ If a skilled artisan would have understood the inventor to be 
in possession of the claimed invention at the time of filing, even if every nuance 
of the claims is not explicitly described in the specification, then the adequate 
description requirement is met."*^ 

Thus, the written description standard is fulfilled by both what is specifically disclosed 
and what is conventional or well known to one skilled in the art. 

SEQ ID NO: 1 and SEQ ID NO:2 are specifically disclosed in the application (see, for 
example, the Sequence Listing, pages 35 through 40. Variants of SEQ ID NO:2 are described, 
for example, at page 7, lines 20-26. Murine Jak2 kinase having 92% sequence identity to the 
human Jak2 kinase of the present invention is described at, e.g., page 3, lines 31-34 and Figure 2. 
Incyte clones in which the nucleic acids encoding human Jak2 kinase were first identified and 
libraries from which those clones were isolated are described, for example, at page 3, lines 24 
through 29 of the Specification. Chemical and structural features of the human Jak2 kinase are 
described, for example, on page 3, lines 31 through 36. Given SEQ ID N0:1 and SEQ ID NO:2, 
one of ordinary skill in the art would recognize naturally-occurring variants having greater than 
92% sequence idenUty to SEQ ID NO:l and SEQ ID N0:2, respectively. The specification 
describes how to use BLAST to determine whether a given sequence falls within the "greater 
than 92% sequence identity" scope (e.g., page 19, line 16 through page 20, line 27). 
Accordingly, the Specification provides an adequate written description of the recited 
polynucleotide and polypeptide sequences. 

A. The present claims specifically define the claimed genus through the 
recitation of chemical structure 

Court cases in which "DNA claims" have been at issue commonly emphasize that the 
recitation of structural features or chemical or physical properties are important factors to 
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consider in a written description analysis of such claims. For example, in Fiers v. Revel, 25 

USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, 
formula, chemical name or physical properties, as we have held, then a description 
also requires that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts 

have noted that the claims attempted to define the claimed DNA in terms of functional 

characteristics without any reference to structural features. As set forth by the court in University 

of California v. Eli Lilly and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate 
insuhn cDNA" or "mammalian insulin cDNA," without more, is not an adequate 
written description of the genus because it does not distinguish the claimed genus 
from others, except by function. 

Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1 . A recombinant plasmid replicable in procaryotic host containing within its 
nucleotide sequence a subsequence having the structure of the reverse transcript of 
an mRNA of a vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore found that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
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U.S.C. §112; i.e., "an mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers. \n contrast to the 
situation in Lilly and Fiers, the claims at issue in the present application define polynucleotides 
and polypeptides in terms of chemical structure, rather than on functional characteristics. For 
example, the "variant language" of independent Claim 36 recites chemical structure to define the 
claimed genus: 

36. An isolated polynucleofide comprising a polynucleotide sequence 
selected from the group consisting of: 

a) the polynucleotide sequence of SEQ ID NO: 1, 

b) a naturally occurring polynucleotide sequence having greater than 92% 
sequence identity to the polynucleotide sequence of SEQ ID NO: 1, 

c) a polynucleotide sequence complementary to a), 

d) a polynucleotide sequence complementary to b), and 

e) an RNA equivalent of a)-d). 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the 
present claims is defined in terms of the chemical structure of SEQ ID NO:2. In the present case, 
there is no reliance merely on a description of functional characteristics of the polynucleotides 
and polypeptides recited by the claims. In fact, there is no recitation of functional characteristics 
for the claimed polynucleotides encoding polypeptide variants. Moreover, if such functional 
recitations were included, it would add to the structural characterization of the recited 
polynucleotides and polypeptides. The polynucleotides and polypeptides defined in the claims of 
the present application recite structural features, and cases such as Lilly and Fiers stress that the 
recitation of structure is an important factor to consider in a written description analysis of claims 
of this type. By failing to base its written description inquiry "on whatever is now claimed," the 
Office Action failed to provide an appropriate analysis of the present claims and how they differ 
from those found not to satisfy the written description requirement in Lilly and Fiers. 

B. The present claims do not define a genus which is highly variant 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
highly variant. Available evidence illustrates that the claimed genus is of narrow scope. 

The case of University of California v. Eli Lilly and Co., 43 USPQ2d 1398 (Fed. Cir. 
1997) provides support for concluding that the polynucleotide genus defined by the present 
claims complies with the written description requirement. As discussed above, certain claims of 
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U.S. Patent No. 4,652,525 were found invalid for failing to satisfy the written description 

requirement. The Lilly case, however, also considered U.S. Patent No. 4,431,740. While there is 

a discussion in Lilly of issues of infringement and enforceability of the claims of the '740 patent, 

there is no written description analysis of the claims of the '740 patent. However, there was no 

holding of invalidity of any claim of the '740 patent. Thus, the claims of the '740 patent are 

presumed to satisfy the written description of 35 U.S.C. §112. See 35 U.S.C. §282. Now 

consider, for example, claim 4 of the '740 patent, which reads as follows: 

4. A DNA transfer vector comprising a deoxynucleotide sequence coding for 
human pre-proinsulin consisting essentially of a plus strand having the sequence: 

5'-.24 GCL 23 X.22 TY.22 TGG.21 ATG.20 W.jg GZ.jg X.ig TY.,8 X.,7 TY.17 CCL 15 X. 
15TY.15 X.|4 TY.,4 GCL.13 X.,2 TY.12 X.ji TY.n GCL ,0 X.9 TY.9 TGG.g GGLy CCL 
eGAK 5 CCL^ GCL 3 GCL 2 GCL , TTKi GTL2 AAK3 CAJ4 CAK, Xg TY^ TGK7 
GGLg QR9 Sg CAKio Xi, TY„ GTL,2 GAJ13 GCL14 X,, TY,, TAK.g X,, TY,, 
GTL18 TGK,9 GCL20 GAJ2, W22 GZ22 GCL23 TTK24 TTK25 TAK26 ACL27 CCLjg 
AAJ29 ACL30 W3, GZ3, W32 GZ32 GAJ33 GCL34 GAJ35 GAK36 X37 TY37 CAJ3g 
GTL39 GGL.40 CAJ41 GTL42 GAJ43 X44 TY44 GGL45 GGL4g GGL47 CCL4g GGL49 
GCL50 GGL5, QR52 S52 X53 TY53 CAJ54 CCL55 X56 TY56 GCL57 X58 TYjg GAJ59 
GGL«, QR„ Ss, X,, TY62 CAJ„ AAJ^ V/,, GZ,, GGL^ ATM,, GTL,, GA],, 
CAJ70 TGK7, TGK72 ACL73 QR74 S74 ATM75 TGK76 QR77 S77 X78 TY78 TAK79 
CAJgo Xgi TYg, GAJ82 AAK83 TAK84 TGKgj AAKgg 

TAGACGCAGCCCGCAGGCAGCCCCCCACCCGCCGCCTCCTGCACCGAG 
AGAGATGGAATAAAGCCCTTGAACCA GC polyA-3' 
wherein 

A is deoxyadenyl, 
G is deoxyguanyl, 
C is deoxycytosyl, 
T is thymidyl, 
J is A or G; 
K is T or C; 
L is A, T, C, or G; 
M is A, C or T; 

X„ is T or C if Y„ is A or G; and C if Y„ is C or T; 



79352 



15 



09/467,100 



Docket No.: PF-0049-2 DIV 



Y„ is A, G, C or T if X„ is C, and A or G if X„ is T; 

W„ is C or A if Z„ is G or A, and C if Z„ is C or T; 
7^ is A, G, C or T if W„ is C, and A or G if W„ is A; 
QR„ is TC if S„ is A, G, C or T, and AG if S„ is T or C; 

Sn is A, G, C or T if QR^ is TC, and T or C if QR„ is AG; and, script numerals, n, 
refer to the position in the amino acid sequence of human proinsuHn, 
to which each triplet in the nucleotide sequence corresponds, according to the 
genetic code, the amino acid positions being numbered from the amino end. 

Claim 4 of the '740 patent recites a DNA sequence which includes the coding region for 
human pre-proinsulin; in particular, the 330 nucleotide bases from codon -GCL24 through codon 
AAKg6 code for human pre-proinsulin. As can be seen from the claim language, claim 4 of the 
'740 patent sets forth a DNA structure with numerous variant positions. Of the 330 nucleotides 
in the coding region for human pre-proinsulin, 141 are potentially variant positions within the 
structure defined by claim 4. Thus, claim 4 of the '740 patent defines a DNA which potentially 
is only 57% identical (189/330 x 100% ^ 57%) to the single species of human pre-proinsulin 
actually sequenced in the '740 patent. See Example 1 and Figure 2. As discussed above, the 
present claims encompass naturally occurring polynucleotides variants which have greater than 
92% sequence identity to the sequence of SEQ ID NO: 1. Clearly, then, the genus variation of the 
present claims is less than that of claim 4 of the '740 patent. 

C. The state of the art at the time of the present invention is further advanced 
than at the time of the Lilly and Fiers appUcations 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requirement of 35 U.S.C. §112. The '525 patent claimed the 
benefit of priority of two applications, AppHcation Serial No. 801,343 filed May 27, 1977, and 
Application Serial No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeli application filed on November 21, 1979. Thus, the written 
description inquiry in those case was based on the state of the art at essentially at the "dark ages" 
of recombinant DNA technology. 

The present application has a priority date of December 5, 1995. Much has happened in 
the development of recombinant DNA technology in the 16 or more years from the time of filing 
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of the applications involved in Lilly and Fiers and the present application. For example, the 
technique of polymerase chain reaction (PGR) was invented. Highly efficient cloning and DNA 
sequencing technology has been developed. Large databases of protein and nucleotide sequences 
have been compiled. Much of the raw material of the human and other genomes has been 
sequenced. With these remarkable advances one of skill in the art would recognize that, given 
the sequence information of SEQ ID NO: 1 and SEQ DD NO:2, and the additional extensive detail 
provided by the subject application, the present inventors were in possession of the claimed 
polynucleotide variants at the time of filing of this application. 

D. Summary 

The Office Action failed to base its written description inquiry "on whatever is now 
claimed." Consequently, the Action did not provide an appropriate analysis of the present claims 
and how they differ from those found not to satisfy the written description requirement in cases 
such as Lilly and Fiers, In particular, the claims of the subject application are fundamentally 
different from those found invalid in Lilly and Fiers. The subject matter of the present claims is 
defined in terms of the chemical structure of SEQ ID NO: 1 and SEQ ID N0:2. The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polynucleotides defined by the 
present claims is adequately described, as evidenced by Brenner et al. and consideration of the 
claims of the '740 patent involved in Lilly, Furthermore, there have been remarkable advances in 
the state of the art since the Lilly and Fiers cases, and these advances were given no 
consideration whatsoever in the position set forth by the Office Action. 

For at least the above reasons, Applicants respectfully request that the Examiner 
withdraw the written description rejection. 

4. Rejection of Claims 19. 20. 22-25. and 29 Under 35 U.S.C. §1 12, first paragraph, written 
description 

New Claims 33, 34, 35, 37, 38, 39, and 40 correspond to canceled Claims 19, 20, 22-25, 
and 29. The Examiner rejected Claims 19, 20, 22-25, and 29 under 35 U.S.C. §112, first 
paragraph, as allegedly containing new matter with respect to the recitation of 90% or 95% 
sequence identity. 
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As amended, the Claims recite "greater than 92% sequence identity." The Specification 
e.g., at page 3, lines 31-34 and page 7, lines 20-26, indicates that Applicants envisioned Jak2 
kinase variants of greater than 92% sequence identity to murine Jak2 kinase. Hence, the 
Specification provides adequate written description of the "variant language" in the claims. 

For at least the above reasons, Applicants respectfully request that the Examiner 
withdraw the "new matter" rejection. 

5. Rejection of Claims 23-25 and 27-29 Under 35 U.S.C. §103fa) as Being Unpatentable 
over Silvennoinen et al. 

Claims 23-25 and 27-29 have been canceled. Claims 23-25 and 29 are replaced by new 
Claims 37, 38, 39, and 40. The Examiner rejected Claims 23-25 and 27-29 Under 35 U.S.C. 
§ 103(a) as being unpatentable over Silvennoinen et al. The Examiner alleged that "[o]ne of 
ordinary skill in the art at the time of filing would be motivated to use the sequence taught by 
Silvennoinen et al. to design oligomers for use as primers to amplify and determine the level of 
mRNA encoding the murine Jak2 protein or to isolate other mRNAs encoding related proteins 
such as human Jak2 using hybridization or polymerase chain reaction methodology." (Office 
Action, page 8.) 

Applicants submit that the novel target polynucleotide recited by the claims is not 
disclosed by Silvennoinen. Without the coding information provided by the target sequence, 
Silvennoinen could not have guided one of skill in the art on how to detect the target sequence. 

For at least the above reasons, Applicants respectfully request that the Examiner 
withdraw the rejection over Silvennoinen et al. 

6. Rejection of Claims 19. 20, and 22 Under the Judicially Created Doctrine of Double 
Patenting over Claims 1-3 of U.S. Patent No. 5,914,393 

Applicants respectfully request the Examiner to hold this rejection in abeyance until there 
is an indication of allowable subject matter. 

7. Rejection of Claims 23-25 and 27-29 Under the Judiciallv Created Doctrine of 
Obviousness-Tvpe Double Patenting over Claims 1-3 of U.S. Patent No. 5.914,393 
Applicants respectfully request the Examiner to hold this rejection in abeyance until there 

is an indication of allowable subject matter. 

CONCLUSION 
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In light of the above amendments and remarks, Applicants submit that the present 
application is fully in condition for allowance, and request that the Examiner withdraw the 
outstanding rejections. Early notice to that effect is earnestly solicited. 

If the Examiner contemplates other action, or if a telephone conference would expedite 
allowance of the claims, Applicants invite the Examiner to contact Applicants' Agent at 
(650) 845-4646. 

Applicants believe that no fee is due with this communication. However, if the USPTO 
determines that a fee is due, the Commissioner is hereby authorized to charge Deposit Account 
No. 09-0108. This form is enclosed in duplicate. 



Respectfully submitted, 
INCYTE GENOMICS, INC. 





Susan K. Sather 
Reg. No. 44,316 

Direct Dial Telephone: (650) 845-4646 



3 160 Porter Drive 
Palo Alto, California 94304 
Phone: (650) 855-0555 
Fax: (650) 849-8886 



79352 



19 



09/467,100 



Docket No.: PF-0049-2 DIV 

VERSION WITH MARKINGS TO SHOW CHANGES MADE 

IN THE SPECIFICATION: 
Paragraph beginning at page 4, line 28 has been amended as follows: 

Antisense molecules, antibodies, antagonists or inhibitors (including proteins, peptides, 
oligopeptides or organic molecules capable of compromising or modulating HJAK2 expression) 
may also be used for therapeutic purposes, for example, in neutralizing the aberrant [abberrent] 
activity of a HJAK2 associated with, for example, inflammation or oncogenesis. The present 
invention also provides for pharmaceutical compositions for the treatment of disease states 
associated with aberrant expression of hjak2 comprising the aforementioned [forementioned] 
antisense molecules, antibodies, antagonists or inhibitors. 

Paragraph beginning at page 15, line 1 has been amended as follows: 

This same assay, combining a sample with the nucleotide sequence, is applicable in 
evaluating the efficacy of a particular therapeutic treatment regime. It may be used in animal 
studies, in clinical trials, or in monitoring the treatment of an individual patient. First, standard 
expression must be established for use as a basis of comparison. Second, samples from the 
animals or patients affected by a disorder or disease are combined with the nucleotide sequence 
to evaluate the deviation from the standard or normal profile. Third, an entirely new or 
pre-existing therapeutic agent is administered, and a treatment profile is generated. This post- 
treatment [posat-treatment] assay is evaluated to determine whether the patient profile progresses 
toward or retums to the standard pattern. Successive treatment profiles may be used to show the 
efficacy of treatment over a period of several days or several months. 

Paragraph beginning at page 17, line 32 and ending on page 18, line 6, has been 
amended as follows: 

The cDNA library was constructed from normal placenta. The tissue was lysed in a buffer 
containing guanidinium isothiocyanate. The lysate was extracted with phenol chloroform and 
precipitated with ethanol. Poly A"" RNA was isolated using biotinylated oligo d(T) primer and 
streptavidin [steptavidin] coupled to a paramagnetic particle (Promega Corp. Madison WI) and 
sent to Stratagene (La Jolla CA) for cDNA library preparation. The cDNA synthesis was primed 
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using both oligo d(T) and random hexamers, and the two cDNA libraries were treated separately. 
Synthetic adapter oligonucleotides were ligated onto the ends of the cDNAs which were digested 
with Xhol and inserted into the UNIZAP vector system (Stratagene). 

Paragraph beginning at page 23, line 11 has been amended as follows: 

Knowledge of the correct cDNA sequence of this Jak2 kinase or its regulatory elements 
enable its use as a tool in sense (Youssoufian H and HF Lodish 1993) Mol Cell Biol 13:98-104) 
or antisense (Eguchi et al (1991) Annu Rev Biochem 60:631-652) technologies for the 
investigation or alteration of gene expression. To inhibit in vivo or in vitro hjak2 [cdp] 
expression, an oligonucleotide based on the coding sequence of an hjak2 designed with OLIGO 
4.0 software (National Biosciences Inc) is used. Alternatively, a fragment of an hjak2 is 
produced by digesting hjak2 coding sequence with restriction enzymes. These enzymes and 
specific restrictions sites may be selected using INHERIT analysis software (Applied 
Biosystems), and the strands separated by heating the fragments and selecting for the antisense 
strand. Either the oligonucleotide or the fragment may be used to inhibit hjak2 expression. 
Furthermore, antisense molecules can be designed to inhibit promoter binding in the upstream 
nontranslated leader or at various sites along the hjak2 coding region. Alternatively, antisense 
molecules may be designed to inhibit translation of an mRNA into polypeptide by preparing an 
oligomer or fragment which will bind in the region spanning approximately -10 to -h 10 
nucleotides at the 5 'end of the coding sequence. These technologies are now well known to 
those of in the art. 

Paragraph beginning at page 26^ line 18 has been amended as follows: 

The sequence for HJAK2 in this application present many different domains (and 
subdomains as detailed in the background of the invention) which may be utilized: 1) 
individually for the production of antibodies, 2) in functional groups (eg. to span a membrane), 
and 3) as interchangeable [interchangable], usable parts of a chimeric kinase. For example, a 
known, full length kinase such as the hjak2 kinase of this application may be used to swap 
related portions of the nucleic acid sequence, analogous to domains or subdomains of MAP 
kinase polypeptides. The chimeric nucleotides, so produced, may be introduced into prokaryotic 
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host cells (as reviewed in Strosberg AD and MaruUo S (1992) Trends Pharma Sci 13:95-98) or 
eukaryotic host cells. These host cells are then en[5)loyed in procedures to determine what 
molecules activate the kinase or what molecules are activated by a kinase. Such activating or 
activated molecules may be of extracellular, intracellular, biologic or chemical origin. 

Paragraph beginning at page 30, line 2 has been amended as follows: 

Polyclonal immunoglobulins are prepared from immime sera either by precipitation with 
ammonium sulfate or by purification on immobilized Protein A (Pharmacia Biotech). Likewise, 
monoclonal antibodies are prepared from mouse ascites fluid by ammonium sulfate precipitation 
or chromatography on immobilized Protein A. Partially purified immunoglobulin is covalently 
attached to a chromatographic resin such as CNBr-activated [CnBr-activated] SEPHAROSE 
(Pharmacia Biotech). The antibody is coupled to the resiu, the resru is blocked, and the 
derivative resiu is washed accordrug to the manufactxirer's instructions. 



IN THE CLAIMS: 



Claims 5 and 14-29 have been canceled without prejudice. 



New Claims 30-42 have been added. 
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.Abstract 

Recent progress in genomics and proieomics technologies has created a unique opportunity to stsmncanilv impact 
ihe pharmaceutical drug development processes. The perception that cells and whole oreanisms express specmc 
inducible responses to stimuli such as drug treatment implies thai umque expression patterns, molecular hngerpnnis. 
inaicaiive of j drug's efficacy and poieniial to.xiciiy are accessible. The integration into state-of-the-art toxicoloey of 
assays allowing one to prohie treatmeni-relaied changes m gene expression patterns promises new insiehis^'mto 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized moni'iorins of 
drug efficacy and safety in pre-clinical and clinical studies based on biologically relevant tissue and surrogate markers. 
C 2000 Elsevier Science Ireland Ltd. All rights reserved. 

Ke-.unnis: Proicomics: Genomics: Toxicology 




1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene e.xpres- 
sion regulations account for both the 
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pharmacological action and the lo.xiciiy of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a muliisiep process that 
results in an active protein (Fig. I). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA ievei can be 
produced using a set oi" different technologies 
such as UNA microarrays. reverse transcnpt 
imaging, amplified fragment length polymorphism 
(AFLP). senal analysis of gene expression 
(SAGE) and others. Currently. DNA microarrays 
are very popular and promise a gceai potential. 
On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion fPCR) and spotted on a suitable substrate 
using robotics (Schena et al.. 1995; Shalon et al.. 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al.. 
1991; Chee et al.. 1996). From control and treated 
tissues, total RNA or mRN.A is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled, 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcnpt by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcnpt 
abundance. 



ev Utters \ 12-11} tZOOOi ^6'-^': 

3. Global protein profiling 

Global quantitative expression analysis ai 
protein level is currently res:ncied to the use of 
two-dimensional gel elecirophoresis. This tech- 
nique combines separation oi nssue protems bv 
isoelectnc focusing m the nrsi dimension and bv 
sodium dodecyl sulfate slab ge! electrophoresis', 
based molecular weight separation on the second 
orthogonal dimension (Anderson et al.. 1991)' 
The product is a rectangular pattern of protein 
spots that are typically revealed dv Coomassie 
Blue, silver or fluorescent staining iFis. 
Protein spots are identihed by mass specirometn. 
following generation of peptide mass ftngerpnnts 
(Mann et al.. 1993) and sequence tags (NVilkins et 
al.. 1996). Similar to the mRNA approach, the 
ratio between the optical dens!t> of spots from 
control and treated samples are compared to 
search for treatment-related changes. 



4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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Fig. 1. Production of an aaive proiwn .s a mulusiep process in which numerous resulaiion svsiems exert control at vanous iiaeei 
of expression. Molecular fingerprints of drugs can be visualized through expression profiling at the mR.NA le^el (genomics) usin-; 
a variety of technologies and at the protein level t proteomics i using two-dimensional eel electrophoresis 
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quaniiiative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
CUV. and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drue candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protem expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level usmg 
techniques allowing PCR-mediaied target amplifi- 
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cation. Tissue biopsy samples typically vield sood 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
orten poor due to the faster degradation of 
mRNA wnen compared with proieins. RNA sam- 
ples from body fluids such as serum or urine are 
or"ten not very meaningfur. and secreted proteins 
are likely more reliable surrogate markers for 
treatment efficacy and satety. Detection of post- 
translaiional modifications, events often related to 
t'unction or nonrunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer. 1997) further 
suggests that the two approaches. mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 



6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successt'ul development of a drug. Mechanistic 
insights are essential for the interpretation of drue 
etlects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al.. 1993; Sieiner ei al.. 1996b: Aicher et al.. 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al.. 1991. 
1995. 1996: Sterner et al. 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 
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7. Perspectives 

The basic methodolog> of iafet\ evaluation has 
changed little during the pasi oecades. Toxicity 
laboratory animals has beer, .naiuared rnmariiv 
by using hematological. :::nical jhemis:- 
histological parameters a> indicators of or-an 
damage. The rapid progress in genomics and pro. 
teomics technologies creates a unique opporiunuv 
to dramatically improve the predictive power of 
saleiy assessment and to ac.eierate the drue devel- 
opment process. .Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drus can- 
didates with higher efficacv and lower toxic*- 
The identification of biologically relevant sur: - 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-ciinical and clinical trails. 
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Differential gene expression in drug metabolism and 
toxicology: practicalities, problems and potential 

JOHN C. ROCKETTt, DAVID J. ESDAILEI 
and G. GORDON GIBSON* 

Molecular Toxicoiog\' Laboraion-. Sch(X>l of Biological Sciences. L'mversitv of Surrev 
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Received January 8 . 1 999 

1. An important tearure of the work of many molecular biologist* is identifying which 
genes are switched on and off in a cell under different environmental conditions or 
subsequent to xenobioiic challenge. Such information has many uses, including the 
deciphering of molecular pathways and facilitating the development of new experimental 
and diagnostic procedures. However, the student of gene hunting should be forgiven for 
perhaps becoming confused by the mountain of information available as there appears lo be 
almost as many methods of discovering differentially expressed genes as there are research 
groups using the technique. 

2. The aim of this review was to clarify the main methods of differential gene expression 
analysis and the mechanistic pnnciples underlying them. Also included is a discussion on 
some of the practical aspects of using this technique. Emphasis is placed on the so-called 
'open ' systems, which require no pnor knowledge of the genes contained wuhin the study 
model. Whilst these will eventually be replaced by ' closed * systems in the study of human, 
mouse and other commonly studied laboratory animals, they will remain a powerful tool for 
those examining less fashionable models. 

3. The use of suppression-PCR subtractive hybridization is excmplihed in the 
identification of up- and down-regulated genes m rat liver following exposure to pheno- 
barbitai. a well-known inducer of the drug metabolizing enzymes. 

4. Differential gene display provides a coherent platform for building libraries and 
nucrochip arrays of 'gene lingerpnnts' charactenstic of known en2>*me inducers and 
xenobiotic toxicants, which may be interrogated subsequently for the identitication and 
characterization of xenobiotics of unknown biological properties. 



Introduction 

It is now apparent that the development of almost all cancers and many non- 
neoplastic diseases are accompanied by altered gene expression m the affected cells 
cotupmd to theimormal state (Hunter 199L Wynford-Thomas 1991, Vogelstein 
and Kinzlcr 1993. Scmenza 1994,Cassidy 1995. Kieinjan and Van Hegnmgen 1998). 
Such changes also occur in response to external stimuli such as pathogenic micro- 
organisms (Rohn et aL 1996. Singh et aL 1997, Griffin and Krishna 1998, Lunney 
1998) and xenobiotics (Sewail et aL 1995, Dogra et aL 1998, Ramana and Kohli 
1998), as well as during the development of undifferentiated cells (Hecht 1998, 
Rudin and Thompson 1998, Schneider-Maunoury et aL 1998). The potential 
medical and therapeutic benefits of understanding the molecular changes which 
occur in any given cell in progressing from the normal to the * altered* state arc 
enormous. Such profiling essentially provides a._*,fingerprint* of each step of a 
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cell's development or response and should help in the elucidation of speciric and 
sensitive biomarkers representing, for example, different r^-pes of cancer or previous 
exposure to certain classes of chemicals that are enzyme inducers. 

In drug metabolism, many of the xenobiotic-metabolizing enzymes f including 
the well-characterized isoforms of cytochrome P450) are inducible by drugs and 
chemicals in man (Pelkonen et aL 1998). predominantly involving transcriptional 
activation of not only the cognate c>tochrome P450 genes, but additional cellular 
proteins which may be crucial to the phenomenon of induction. Accordinglv, the 
development of methodology to identify and assess the full complement of genes 
that are either up- or down-regulated' by inducers are crucial in the development of 
knowledge to understand the precise molecular mechanisms of enzyme induction 
and how this relates to drug action. Similarly, in the field of chemical-induced 
toxicity, it is now becoming increasingly obvious that most adverse reactions to 
drugs and chemicals are the result of multiple gene regulation, some of which are 
causal and some of which are casually-relaied to the toxicological phenomenon per 
se. This observation has led to an upsurge in interest in gene-profiling technologies 
which differentiate between the control and toxin-treated gene pools in target tissues 
and is, therefore, of value in rationalizing the molecular mechanisms of xenobiotic- 
induced toxicir\\ Knowledge of toxin-dependent gene regulation in target tissues is 
not solely an academic pursuit as much interest has been generated in the 
pharmaceutical industry to harness this technology in the early identification of toxic 
drug candidates, thereby shortening the developmental process and contributing 
substantially to the safety assessment of new drugs. For example, if the gene profile 
in response to say a testicular toxin that has been well-characterized in vivo could be 
determined in the testis, then this profile would be representative of all new drug 
candidates which act via this specific molecular mechanism of toxicit>', thereby 
providing a useful and coherent approach to the early detection of such toxicants. 
VVTiereas it would be informative to know the identity and functionality of all genes 
up/down regulated by such toxicants, this would appear a longer term goal, as the 
majority of human genes have not yet been sequenced, far less their functionality 
determined. However, the current use of gene profiling yields a pattern of gene 
changes for a xenobiotic of unknown toxicity which may be matched to that of well- 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
enensrve toxicological examination. Such approaches are begmnrng to gain 
momentum, in that several biotechnolog\- companies are commercially producing 
'gene chips* or *gene arrays* that may be interrogated for toxicit\' assessment of 
xenobiotics. These chips consist of hundreds/thousands of genes, some of which are 
degenerate. in the sense that not all of the genes are mechanistically-related to any 
one toxicological phenomenon. Whereas these chips are useful in broad-spectrum 
screening, they are maturing at a substantial rate* in that gene arrays are now 
becoming more specific, e.g. chips for the identification of changes in growth factor 
families that contribute to the aetiology and development of chemically- induced 
neoplasias. . . 

Although documenting and explaining~the?e genetic changes presents a 
formidable obstacle to understanding the different mechanisms of development and 
disease progression, the technology is now avcibble-to begin attempting this difficult 
challenge. Indeed, several 'differential expression analysis' methods have been 
developed which facilitate the identification of gene products that demonstrate 
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altered expression in cells of one population compared to another. These merhod^ 
have been used to identif>- differential gene expression in many situations, mcludme 
invading pathogenic microbes (Zhao et al. 1 998). m cells responding to extracellular 
and intracellular microbial invasion (Duguid and Dinauer 1990. Ragno et at 1997 
Maldarelli et aL 1998). in chemically treated cells (Syed et aL 1997^ Rocketi al 
1999). neoplastic ceils (Liang et aL 1992, Chang and Terzaghi-Howe 1998) 
activated cells (Gurskaya et aL 1996. Wan et aL 1996). differentiated cells (Hara et 
aL 1991. Guimaraes et aL 1995a. b). and different cell tvpcs (Davis et al 1984 
Hednck et aL 1984. Xhu et aL 1998). Although differential expression anaivsis 
technologies arc applicable to a broad range of models, perhaps their most important 
advantage is that, m most cases/absolutely no prior knowledge of the speciric genes 
which are up- or down-regulated is required. 

The field of differential expression analysis is a large and complex one vv,th 
many techniques available to the potential user. These can be categorized into 
several methodological approaches, including: 

( 1 ) DiflFerential screening, 

(2) Subtractivc hybridization (SH) (mcludes methods such as chemical cross- 
hnkmg subtraction— CCLS. suppression-PCR subtractive hvbndization— 
SSH. and representational difference analysis— RDA). 

(3) Differential display (DD). 

(4) Restriction endonuclease facilitated analysis (including serial analysis of gene 
expression— SAGE— and gene expression fingerprinting— GEF), 

(5) Gene expression arrays, and 

(6) Expressed sequence tag (EST) analysis. 

The above approaches have been used successfullv to isolate differentially 
expressed genes m different model systems. However, each method has its own 
subtle (and sometimes not so subtle) characteristics which incur various advantages 
and disadvantages. Accordingly, it is the purpose of this review to clarify the 
mechanistic principles underlying the main differential expression methods and to 
highlight some of the broader considerations and implications of this verv powerful 
and mcreasmgly popular technique. Specifically, we will concentrate on the so- 
called *open' systems, namely those which do not require anv knowledge of gene 
sequences and. therefore, are useful for isolating unknown genes. Two * closed' 
s>>steinsUhose utUisin g previously idennned gene sequences). EST anaivsis and the 
□se of DNA amrys, will aiscr be cenaidered briefly for comoieteness. WTiilst 
emphasis will often be placed on suppression PGR subtractive hvbndization (SSH 
the approach employed in this laboratory), it is the aim of the authors co highlight' 
wherever possible, those areas of common interest to those who use, or intend to use! 
differential gene expression analysis. - - - 



Differential cDNA library screening (DS) 

Despite the development of multiple technological advances which have recently 
brought the geld of gene expression profiling to the forefront of molecular analysis, 
recognition of the importance of differential gene expression and characterization of 
differentially expressed genes has existed for many years. One of the original 
approaches used to identify such genes was described 20 years ago by St John and 
Davu (1979). These authors developed a method, termed ^differential plaque filter 
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hybridization \ which was used to isolate galactose-inducible DNA sequences from 
yeast. The theory is simple: a genomic DNA library is prepared from normal, 
unstimulated cells of the test organism/ tissue and multiple filter replicas are 
prepared. These replica blots are probed with radioactivcly (or othertvise) labelled 
complex cDNA probes prepared from the control and test cell mRN A populations. 
Those mRNAs which arc difFerentially expressed in the treated cell population will 
show a positive signal only on the filter probed with cDNA from the treated cells. 
Furthermore, labelled cDNA from different test conditions can be used to probe 
multiple blots, thereby enabling the identification of mRNAs which are only up- 
regulated under certain conditions. For example, St John and Davis ( 1 979) screened 
replica filters with acetate-, glucose- and galactose-dcrived probes in order to obtain 
genes induced specifically by galactose metabolism. Although groundbreaking in its 
time this method is now considered insensitive and time-consuming, as up to 2 
months are required to complete the identification of genes which are differentially 
expressed in the test population. In addition, there is no convenient way to check 
that the procedure has worked until the whole process has been completed. 

Subtractive Hybridization (SH) 

The developing concept of differential gene expression and the success of early 
approaches such as that described by St John and Davis (1979) soon gave rise to a 
search for more convenient methods of analysis. One of the first to be developed was 
SH. numerous variations of which have since been reported (see below). In general, 
this approach involves hybridization of mRNA/cDNA from one population (tester) 
to excess mRNA/cDNA from another (driver), followed by separation of the 
unhybridized tester fraction (differentially expressed) from the hybridized common 
sequences. This step has been achieved physically, chemically and through the use 
of selective pol>Tnerase chain reaction (PCR) techniques. 



Physical separation 

Original subtractive hybridization technology involved the physical separation 
of hybridized common species from unique single stranded species. Several methods 
of achieving this have been describe<L including hydroxyapanie chromatography 
(Sargent and Dawid 1983), avidin-biotin technology (Duguid and Dinauer 1990) 
and oligodT-latex separation (Hara et al, 1991). In the first approach, common 
mRNA species are removed by cDNA (from test cells)-mRNA (from control cells) 
subtractive hybridization followed by hydroxyapatite chromatography, as hydroxy- 
apatite specifically adsorbs the cDNA-mRNA hybrids. The unabsorbed cDNA is 
then used either for the construction of a cDNA library of differentially expressed 
genes (Sargent and Dawid 1983, Schneider et al, 1988) or directly as a probe to 
screen a preselected library (Zimmerman et al, 1980, Davis et al, 1 984, Hedrick et al. 
1984). A schematic diagram of the procedure is shown in figure 1. 

Less rigorous physical separation procedures coupled with sensitivity enhancing 
PCR steps were later developed as a means to overcome some of the problems 
encountered with the hydroxyapatite procedure. For example, Daguid and Dinauer 
(1 990) described a method of subtraction utilizing biotin-afiinity systems as a means 
to remove hybridized common sequences. In this process, both the control and 
tester mRNA populations arc first convened to cDN A and an adaptor (' oligovector * , 
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Sepharose CL6B exdusion — > Small cDNA fragments (<450bp) 
chromatography 

Enriched, differentially expressed cONA 



Produce denes laM directly and probe libr»y 

Figure I. The hydroxyapttite method of jubtractive hybridization. cDNA derived from the 
treated/ altered (tester* popuUtion ts mixed with a lar?e excess oi mRNA from the control tdnver^ 
poptiltnon. Following hyimdixatian. tnRNA-cDNA hybrids are removed by h>-dn)X>ipatiie 
chromatography. The oniy cDNAs which remain are those which are dinerentiaiiy expressed in 
the treated /altered popuiatxon. In order to facilitate the recovery of full length clones, small cDN'A 
frmgmenta are removed by exclusion chromatography. The remaining cDN As are then cloned into 
a vector for sequencing, or labelled and used directly to probe a librarv. as described bv Sarcent 
and Dawid( 1983). ' 

containing a restriction site) ligated to both sides. Both populations are then 
amplified by PCR, but the driver cDNA population is subsequently digested with 
the adaptor-containing restriction cndonucleaae. This serves to cleave the oligo- 
vector and reduce the amplification potential of the control population. The digested 
control population is then biotinylated and an excess mixed with tester cDNA. 
Following denaturation and hybridization, the mix is applied to a biocytin column 
(streptavidin may also be used) to remove the "control population, including 
heieroduplexes formed by aimealing of common sequences from the tester 
popularion. The procedure is repeated several rimes following the addition of fresh 
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Ftgu« 2. The uje of oUgodT„ Utex to perform .ubtnctive hybridization. mRNA extncted from the 
control (drivef) popuUtioa it converted to anchored cDNA using polydT oligonudeottdet 
atttched to Utez beads. mRNA from the treated/altered (te«er) popuUtion U repeatedly 
hybndixed avaintt an excess of the anchored driver cDNA. The final popuistion of mRNA U 
tester specific and can bexonverted into cDNA for cloning and other downstream applications as 
described by Han ef a/. (199!). 
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slQr)nriRNA control cDNA. In order to further enrich those species diiTerennally expressed m 

tester cDNA, the subtracted tester population is amplihed bv PCR tollowini^ 

— ^AAAA ever>' second subtraction cycle. After six cycles of subtraction i three reamplihcation 

steps) the reaction mix is ligated into a vector for further analysis. 

In a slightly different approach. Hara et aL (1991) utilized a method wherebv 
oligo(dTj^)) primers attached to a latex substrate arc used to first capture mRNA 
extracted from the control population. Following 1st strand cDN.A synthesis, the 
RNA strand of the hetcroduplexes is removed by heat denaturation and centri- 
fugation (the cDNA-oligotex-dT^^ forms a pellet and the supernatant is removed). 
.A quantity of tester mRNA is then repeatedly hybridized to the immobilized control 
(driver) cDNA (which is present in 20-fold excess), .\fter several rounds of 
hybridization the only mRNA molecules left in the tester mRNA population are 
those which are not found^in the driver cDNA-oligotex-dTjo population. These 
tester-specific mRNA species arc then converted to cDNA and. following the 
addition of adaptor sequences, amplified by PCR. The PCR products are then 
ligated into a vector for further analysis using restriction sites incorporated into the 
PCR primers. A schematic illustration of this subtraction process is shown in figure 

However, all these methods utilising physical separation have been described as 
inefficient due to the requirement for large starting amounts of mRNA. significant 
loss of material during the separation process and a need for several rounds of 
hybridization. Hence» new methods of differential expression analysis have recently 
been designed to eliminate these problems. 
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Chemical Cross-Linking Subtraction (CCLS) 

In this technique, originally described by Hampson et at, (1992), driver mRNA 
is mixed with tester cDNA (1st strand only) in a ratio of > 20: 1. The common 
sequences form cDNAimRNA hybrids, leaving the tester specific species as single 
stranded cDNA. Instead of physically separating these hybrids, they are inactivated 
chemically using 2,5 diaziridinyl-l,4-benzoquinone (DZQ). Labelled probes arc 
then synthesized from the remaining single stranded cDNA species (unreacted 
mRNA species remaining from the driver are not converted into probe material due 
to specificit>* of Sequenase T7 DNA pol\Tnerase used to make the probe) and used 
to screeaA cDNA library made from the wster cell population. A schcmanc diagram 
01 the system is shown in figure 3. 

It has been shown that the differentially expressed sequences can be enriched at 
least 300- fold with one round of subtraction (Hampson et aL 1992), and that the 
technique should allow isolation of cDNAs derived from transcripts that arc present 
at less than 50 copies per cell. This equates to genes at the low end of intermediate 
abundance (see tabic 1). The main advantages of the CCLS approach are that it is 
rapid, technically simple and also produces fewer false positives than other 
differential expression analysis methods. However, like the physical separation 
protocols, a major drawback with CCLS is the large amount of starting material 
required (at least 10;ig RNA). Consequently, the technique has recently been 
refined so that a renewable source of RNA can be generated. The degenerate random 
oligonucleoride primed (DROP) adaptation (Hampson et aL 1996. Hampson and 
Hampson 1997) uses random hexanucleotide sequences to prime solid phase* 
synthesized cDNA. Since each primer includes a T7 polymerase promotor sequence 
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Figure ^Chemical cro»-linking »ubir«tion. Excess driver mRNA is mixed uith 1" itrsnd tester 
cD^uV. The coounm saqueaccs torm mfCkA:cDXA hvbnds which are cross linked w«h = 
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expressed m the tester populstion. Probes sre msde from these sequences usmg Sequensse 2.6 
DNA polymmse. which lacks reverse trsnschptue acnviry and. therefore, does not react with the 
rwnainmg mRNA molecules from the dnver. The labelled probes ace then used to screen a cDNA 
hbranHfor clones of differeniullv expressed sequences. Adapted from Walter et al. (1996) with 



Table I. The abundance of mRNA species and classes in a vy^iczl mammalian cell. 



mRNA 
class 


Copies of . 

each 
tpecies/ceU 


No. of mRNA Mean 2^f 
species in each species 
class in class 


Mean mass 
(ng) of each 
spectes/i/g 
total RNA 


Abundant 

Intermediau 

Rare 


12000 
300 
15 


4 3.3 
500 0.08 ' 
nOOO 0.004 


1.65 
0.04 
0.002 



— Modified from Bertioli tt al, (1995). ~"'^=r^i^i?lTlr. 
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at the 5' end. the nnal pool of random cDN A -fragments is a PCR-renewabie cDNA 
population which is representative of the expressed gene pool and can be used to 
synthesize sense RNA for use as driver material. Furthermore, if the final pool of 
random cDNA fragments is reamphficd using biotmylated TT primer and random 
hexamer, the product can be captured with streptavidin beads and the antiscnse 
strand eluted for use as tester. Since both target and driver can be generated from 
the same DROP product, subtraction can be performed in both directions (i.e. for 
up- and down- regulated species) between two different DROP products. 

Representational Difference Analysis (RDA ) 

RDA of cDNA (Hubank and Schatz 1994) is an extension of the technique 
originally applied to genomic DNA as a means of identifying differences between 
two complex genomes (Lisitsyn et al. 1993). It is a process of subtraction and 
amplification involving subtractivc hybridization of the tester in the presence of 
excess driver. Sequences in the tester that have homologues in the driver are 
rendered unamplifiablc. whereas those genes expressed only in the tester retain the 
ability to be amplified by PGR. The procedure is shown schematically in figure 4. 

In essence, the driver and tester mRNA populations are first converted to cDN A 
and amplified by PGR following the ligation of an adaptor. The adaptors are then 
removed from both populations and a new (different) adaptor ligated to the 
amplified tester population only. Driver and tester populations are next melted and 
hybridized together in a ratio of 100: 1. Following hybridization, only tester: tester 
homohybrids have 5' adaptors at each end of the DNA duplex and can, thus, be filled 
m at both 3' ends. Hence, only these molecules are amplified exponentially during 
the subsequent PGR step. Although tester: driver heterohybrids are present, they 
only amplify in a linear fashion, since the strand derived from the driver has no 
adaptor to which the primer can bind. Driver: driver heterohybrids have no 
adaptors and, therefore, are not amplified. Single stranded molecules arc digested 
with mung bean nuclease before a further PCR-enrichment of the tester: tester 
homohybrids. The adaptors on the amplified tester population are then replaced and 
the whole process repeated a further two or three times using an increasing excess of 
driver (Hubank and Shatz used a tester: driver ratio of 1:400, 1:80000 and 
1 : 800000 for the second, third and fourth hybridizations, respectively). Different 
adaptors are ligated to the tester benveen successive rounds of hybridization and 
ampirficarion to prevent the accumulation of PGR products that might interfere with 
subsequent amplifications. The final display is a senes of differentially expressed 
gene products easily observable on an ethidium bromide gel. 

yy^^ '"ain advantages of RDA are that it offers a reproducible and sensitive 

approach to the analysis of differentially expressed genes. Hubank and Schatz (1994) 
reported that they were able to isolate genes that were differentially expressed in 
substantially less than 1 % of the cells from which the tester is derived. Perhaps the 
main drawback is that multiple rounds of ligation, hybridization, amplifiation and 
digestion are required. The procedure is, therefore, lengthier than many other 
differential display approaches and provides more opportunity for operator- induced 
error to occur: Although the generanon of false positives has been noted, this has 
been solved'to some degree by O'Neill and Sinclair (1997) through the use of HPLG- 
purified adaptors. These are free of the truncated adaptors which appear to be a 
major source of the false positive bands. A very similar technique to RDA, termed 
linker capture subtraction (LCS) was described by Yang and Sytowski (1996). 
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Figure 4, The represenurioiui difference analysis (RDA) technique. Driver and tester cDNA are 
digested with a 4-cutter restriction enzyme such as DpnW The t" set of 12/24 adaptor strands 
(oligonucleotides) are Ugated to each other and the digested cDNA products. The I2iner is 
subsequently melted away and the 3*ends filled in using Taq DNA polymerase. Each cDNA 
population U then amplified using PCR. following which the 1" set of adaptors is removed with 
l>pn\\, A second set of t2/24 adaptor strands is then added_to the amplified tester cDNA 
popuiatioiti, after which the tester is hybridized againsra largeTxcess of driver. The IZmer 
adaptors are melted and the 3' ends filled in as before. PGR is carried out with primers identical 
to the new 24mer adaptor. Thus, the only hybridization products which are exponentially 
amplified are those which are tester: tester combinations. Following PGR. ssONfA products are 
removed with mung bean nuclease, leaving the 'first difference product*. This is digested and a 
third set of 12/24 adaptors added before repeating the subtraction process from the hybridization 
stsge. The process is repeated to the or 4** difference product, as described by Lisitsyn a/. 
(1993) and Hubank and Schatz(lW). * - - 
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Suppression PCR Subtr active Hybridization ( SSH > 

The most recent adaptation of the 5H approach to difTercntiaJ expression 
analysis was first described by Diatchenko et aL (1996) and Gurskaya et al, f 1996». 
They reported that a 1000-5000 fold enrichment of rare cDNAs (equivalent to 
isolating mRNAs present at only a few. copies per cell) can be obtained without the 
need for multiple hybridizations/subtractions. Instead of physical or chemical 
removal of the common sequences, a PCR-bascd suppression system is used uee 
figure 5). 

In SSH, excess driver cDNA is added rcrrwo ponions of the tester cDNA which 
have been ligatcd with different adaptors. A first round of hybridization serves to 
enrich differentially expressed genes and equalize rare and abundant messages. 
Equalization occurs since reannealing is more rapid for abundant molecules than for 
rarer molecules due to the second order kinetics of hybridization i J ames and H iggms 
1 985). The two primar\- hybridization mixes are then mixed together in the presence 
of excess driver and allowed to hybridize further. This step permits the annealing of 
single stranded complementary sequences which did not hybridize in the primary 
hybridization, and in doing so generates templates for PCR amplification. Although 
there are several- possible combinations of the single stranded molecules present in 
the secondary- hybridization mix, only one particular combination (differentially 
expressed in the tester cDNA composed of complimentar\- strands having different 
adaptors) can amplify exponentially. 

Having obtained the final differential display, two options are available if cloning 
of cDNAs is desired. One is to transform the whole of the final PCR reaction into 
competent cells. Transformed colonics can then be isolated and their inserts 
characterized by sequencing, restriction analysis or PCR. Alternatively, the final 
PCR products can be resolved on a gel and the individual bands excised, reamplified 
and cloned. The first approach is technically simpler and less time consuming. 
However, ligation/transformation reactions are known to be biased towards the 
cloning of smaller molecules, and so the final population of clones will probably not 
contain a representative selection of the larger products. In addition, although 
equalization theoretically occurs, observations in this laborator>- suggest that this is 
by no means perfectly accomplished. Consequently, some gene species are present 
in a higher number than others and this will be represented in the final population 
of clones. Thus, in order to obtain a substantial proportion of those gene species that 
acrtxally demonstrate differential expressiorrin the tester popuianon. the number or 
clones that will have to be screened after this step may be substantial. The second 
approach is initially more time consummg and technically demanding. However, it 
would appear to offer better prospects for^ cloning larger and low abundance gel 
products. In addition, one can incorporate'a screening step that differentiates 
different products of different sequences but of the same size (HA-staining, see 
later). In this way, a good idea of the final number of clones to be isolated and 
identified can be achieved. 

An alternative (or even complementary) approaches to use the final differential 
display reaction to screen a cDNA library to isolate full length clones for further 
characterization, or a DNA array (see later) to quickly identify known genes. SSH 
has been used in this laboratory to begin characterization of the short-term gene 
expression profiles of enzyme-inducers such as phenobarbital (Rocken et al. 1997) 
and Wy. 14.643 (Rocken et aL unpublished observations). The isolation of 
differentially expressed genes in this manner enables the construction of a fingerprint 
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Figure b. Flow diagram showing method usea in this laboratory* to isolate and identin* clones ot senes 
which are diiTerentially expressed m rat liver following shon term exposure to the eTur>Tne 
inducers, phenobarbital and Wy* 14,643. 

of expressed genes which arc unique to each compound and time/dose point. Such 
information could be useful in shon-term characterization of the toxic potential of 
new compounds by comparing the gene-expression profiles they elicit with those 
produced by known inducers. Figure 6 shows a flow diagram of the method used to 
isolate, verify and clone differentially expressed genes, and figure 7 shows expression 
profiles obtained from a typical SSH experiment. Subsequent sub-cloning of the 
individual bands, sequencing and gene data base interrogation reveals many genes 
which are either up- or down-regulated by phenobarbital in the rat (tables 2 and 3). 

One of the advantages in using the SSH approach is that no prior knowledge is 
required of which specific genes are up/down-regulated subsequent to xenobiotic 
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1 ^ 3 4 > 6 



Figure 7. SSH display panems obuined from rat liver following 3-day treatment with UT- 14.643 orr 
phenobarbital. mRNA extracted from control and treated livers wa$ used co generate the 
differential displays using the PCR-Select cDNA subtraction kit (Clontech). Lane: 1— Ikb 
ladder ; 2— genes upregulated following Wy , I •*-643 treatment : 3— genes downregulaied following 
Wy,14-H643 treatment; A — genes upregulated following phenobarbital treatment; Scenes 
downregulated following phenobarbital treatment; 6 — Ikb ladder. Reproduced from Rocken et 
al. (1997). with permission. 

exposure, and an almost complete complement of genes are obtained. For example, 
the peroxisome prolifcraior and non-genotoxic hepaiocarcinogen Wy, 14.643, up- 
regulates at least 28 genes and down- regulates at least 15 in the rat (a sensitive 
species) and produces 48 up- and 37 down-regulated genes in the guinea pig, a 
resistant species (Rockett, Swales, Esda and Gibson, unpublished obsen-ations). 
One of these genes, CD81, was up-regulated in the rat and down-regulated in the 
guinea pig following Wy.14,643 treatment. CD81 (alternatively named TAPA-1) is 
a widely expressed cell surface protein which is involved in a large number of cellular 
processes including adhesion, activation, proliferation and differentiation (Levy et 
al, 1998). Since all of these functions are altered to some extent in the phenomena 
ot hepaxomegaly and non-genotoxic hepatocarcinogencsis. it is intriguing, and 
probably mechanistically- re levant, that CD81 expression is difTerentially regulated 
in a resistant and susceptible species. However, the down-side of this approach is 
that the majority of genes can be sequenced and matched to database sequences, but 
the latter arc predominantly expressed sequence tags or genes of completely 
unknown function, thus partially obscuring a realistic overall assessment of the 
critical genes of genuine biological interest^ Notwithstanding the lack of complete 
funtional identification of altered gene expression, such gene profiling studies 
essentially provides a * molecular fingerprint/ in response to xenobiotic challenge, 
thereby serving as a mechanistically-relevant platform for further detailed 
investigations. 

Differential Display (DD) - ~ 

Originally described as * RNA fingcrprinting by.acbitrarily primed PGR' (Liang 
and Pardee 1992) this method is now more commonly referred to as 'differential 
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Genes up-regulaccd m rat liver tollo^ 


**in J 3*di V cx0O$urc to pHcnooirbiti' 


Band number 






( approximate 


Highest sequence 




»ize in opf 


similanr>' 


r 1 .A- c.MdI- ecne taentincation 


5 (1300) 




I r.D ) 


7 (1000) 




Preproalbumin 






^crum aioumin mrv.>,A 


8 (9S0) 


98.3 »o 


NXI-CGAP-Prl H. sapiens (EST) 


10 (850) 


95.7°o 


CYP2B1 


1 1 (800) 


Clone 1 94.9 Oo 


CYP2B1 




Clone 2 75.3 % 


CYP2B2 


12 (750) 


93.8 *»o 


TRPM-2 mRNA 






Sulfated glycoprotein 


15 (600) 


92-9% 


Preproalbumin 






Serum albumin mRNA 


16(55) 


Clone I 95.2*^0 


CYP2BI 




Clone 2 93.6% 


HaptoglobuUn mRN.A partial alpha 


21 (350) 


99.3 °b 


18S. 5.8S i 28S rR.Na 



Bands 1—*, 6. 9, 13. 1 4, and 17-20 are shown to be false positives by dot blot anaylsts and, therefore, 
are not sequenced. Derived from Rockett et aL (1997). It should be noted that the above genes do not 
represent the complete spectrum of genes which are up-reguiated m rat liver "by phenobarbital. but 
simply represents the genes sequenced and identined to date. 



Table 3, Genes down -regulated in rat liver foUowmg 3 -day exposure to phenobarbital. 



Band number 
(approximate 
size in bp) 



Highest sequence 
similarity 



FASTA-EMBL gene identification 
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.\lpha-2u-globulin (s-type) mRNA 
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Clone 1 


86.9% 


Soares mouse NML M. musculus (EST) 




Clone 2 


82.0 «o 


Soares p3NMF 19,5 M. musculus (EST) 


10 (550) 




73.8% 


Soares mouse NML M. musculus (EST) 


U (52S) 




95.7% 


NCl-CC.\P-Prl H. sapiens (EST) 


12 (375) 




100.0^0 


Ribosomal protein 


13 (23) 


Clone 1 


97.2', 


Soares mouae embno NbMEl 35 (ESTl 




Clone 2 


100.0% 


Fibnnogen B-beta-cnain 
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Clone 3 
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.Apolipoprotetn E gene 




96.0 


Soares p3NMFl9.5 M. musculus (EST) 


15(140) 




97.3% 


Straiagene mouse testis (EST) 


Others: (300) 




96.7% 


R, norvegicus RASP 1 mRNA 


(275) 




93.1 % 


Soares mouse mammary gland (EST) 



EST - Expressed sequence tag. Bands 4-6 were shown to be false positives by dot blot analysis and, 
therefore, were not sequenced. Derived from Rockett etal.{\ 997). 1 1 should be noted that the above genet 
do not represent the complete spectrum of genes which are down -regulated in rat liver by phenobarbital. 
but aimiply reprcaenu the genet sequenced and identified to date. 



display * (DD). In this method, all the mRNA species in the control and treated cell 
A Dr-o ' n ' populations arc amplified in separate reactions using reverse transcriptase- PCR 

rredT^ * d^ (RT-PCR). The products are then run side-by-sidc on sequencing geU. Those 

rre to as dmerenual - bands which are present in one display only, o^ which are much more intense in one 
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display compared to the other, are differentially expressed and may be recovered for 
further characterization. One advantage of this system is the speed with which it can 
be carried out — 2 days to obtain a display and as linlc as a week to make and identify 
clones. 

Two commonly used variations are based on different methods of priming the 
reverse transcription step (figure 8). One is to use an oligo dT with a 2-base " anchor ' 
at the 3*.end, e.g. 5' (dTii)CA 3' (Liang and Pardee 1992). Alternatively, an 
arbitrary primer may be used for 1st strand cDNA synthesis (Welsh et aL 1992). 
This variant of RNA fingerprinting has also been called 'RAP' (RNA Arbitrarily 
Primcd)-PCR. One advantage of this second approach is that PCR products may be 
derived from an^-where in the RNA, including open reading frames. In addition, it 
can be used for mRNAs that are not polyadenylated» such as many bacterial mRNAs 
(Wong and McClelland 1994). In both cases, following reverse transcription and 
denaturation, second strand cDNA synthesis is carried out with an arbitrary primer 
(arbitrary primers have a single base at each position, as compared to random 
primers, which contain a mixture of all four bases at each position). The resulting 
PCR, thus, produces a series of products which, depending on the system (primer 
length and composition, polyrrierase and gel system), usually includes 50-100 
products per primer set (Band and Sager 1989). When a combination of different 
dT-anchors and arbitrar>' primers are used, almost all mRNA species from a cell can 
be amplified. When the cDNA products from two different populations are analysed 
side by side on a polyacrylamide gel, differences in expression can be identified and 
the appropriate bands recovered for cloning and further analysis. 

Although DD is perhaps the most popular approach used today for identif\'ing 
differentially expressed genes, it does suffer from several perceived disadvantages ; 

(1) It may have a strong bias towards high copy number mRNAs (Bertioli et al. 
1995). although this has been disputed (Wan et aL 1996) and the isolation of very- 
low abundance genes may be achieved in cenain circumstances (Guimeraes et 
aL 1995a). 

(2) The cDNAs obtained often only represent the extreme 3' end of the mRNA 
(often the 3 '-untranslated region), although this may not always be the case 
(Guimeraes et aL 199.5a). Since the 3' end is often not included in Genbank and 
shows variation between organisms. cDNAs identified by DD cannot always be 
matched with their genes, even if they have been identified. 

(3) TTie pattern of differential expression seen on the display often cannot be 
reproduced on Northern blots, with false positives arising in up to 70 ® o of cases 
(Sun et aL 1994). Some adaptations have been shown to reduce false positives, 
including the use of two reverse transcriptases (Sung and Denman 1997), 
comparison of uninduced and induced celts over a time course (Bum et aL 1994) 
and comparison of DDFCR-producu from two uninduced and two induced 
lines (Sompayrac et aL \99S). The latter authors also reponed that the use of 
cytoplasmic RNA rather then total RNA reduces false positives arising from 
nuclear RNA that is not transported to the cytoplasm. 

Further details of the background, strengths and weaknesses of the DD 
technique'cah be obtained'from a review "ByTVTcClclIand tff aL (1996) and from 
articles by Liang et al\ (199S) and Wan et at. (1996)7 
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Figure 8 Two approaches to differential display (DD) analysis. 1" strand svnthesis can be carried out 
either with a polydT^NN pnmer (where \ « G. C or A) or wiih an arb.tran- primer. The use of 
different combinations of C. C and A to anchor the first strand polydT pnmer enables the priming 
of the majonry of polyadenylated mRNAs. Arbitrary pnmem may hybndize at none, one or more 
places along the length of the mRNA. allowing I" strand cDNA svnthesis to occur at none one 
or more pomts in the same gene. In both cases. 2« strand synthesis ts carried out wiih an arbitran- 
pnmer. Since these arbitrary pnmers for the 2" strand mav also hvbndize to the r strand cD\ A 
m a number of different places, several different 2'"' strand products mav be obtained from one 
binding point of the r strand pnmer. Following 2'« strand synthesis, the onirinai set of pnmen 
IS "*ed lo amplif>- the second strand products, with the result thai numerous uene sequences are 
acnplxfieQ. 



Restriction endonuclease^faciliuted analysis of gene expression 
Serial Analysis of Gene Expression (SAGE) 

A more recent development in the field of differential display is SAGE analysis 

, (Velculcscu etalA 995). This method uses a different approach to those discussed so 

fa r and is based on two principles. Firstly, in more than 95% of cases, short 

ntidcottde sequences ('tagr*) of- onl^ninc or 10 base pairs provide sufficient 
information to identify their gene of origin. Secondly, concatonation (linking 
together in a series) of these tags allows sequencing of multiple cDNAs within a 
single clone. Figure 9 shows a schematic representation of the SAGE process. In this 
procedure, double stranded cDNA from the test cells is synthesized with a 

. ^I?',^/^**^ polydT primer.^ollowing -digestion with a commonly cutting (4bp 

rccognitidh sequence) restriction enzyme Canchoring enzyme*), the 3' ends of the 

- - - cDN A. population are capmrcd with strcptawidin beads. The captured population is 
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split into wo and different adaptors Ugated to the 5 * ends of each group. Incorporated 
into the adaptors is a recognition sequence for a t^•pe IIS restriction enzvme — one 
which- cuts DNA at a defined distance ( < 20 bp) from its recognition sequence 
Hence, following digestion of each captured cDN A population with the IIS enzyme, 
the adaptors plus a short piece of the captured cDNA are released. The two 
populations arc then ligaied and the products amplified. The amplified products are 
cleaved wirh the original anchoring enzyme, rcligaied (concaiomers are formed in 
the process) and cloned. The advantage of this system is that hundreds of gene tags 
can be identified by sequencing oiily a few clones. Furthermore, the number of times 
a given transcript is identified is a quantitative measurement of that gene s 
abundance in the original population, a feature which facilitates identification of 
differentially expressed genes in different cell populations. 

Some disadvantages of SAGE analysis include the technical difficulty of the 
method, a large amount of accurate sequencing is required, biased towards abundant 
mRNAs, has not been validated in the pharmaco/toxicogenomic settmg and has 
only been used to examine well known tissue differences to date. 



Gene Expression Fingerprinting (GEF) 

A different capture/ restriction digest approach for isolating differentially 
expressed genes has been described by Ivanova and Belyavsky (1995). In this 
method, RNA is converted to cDNA using biotinylated oligo(dT) primers. The 
cDNA population is then digested with a specific endonuclease and captured with 
magnetic streptavidin micro beads to facilitate removal of the unwanted 5' digestion 
products. The use of restricted 3'-ends alone serves to reduce the complexity of the 
cDNA fragment pool and helps to ensure that each RNA species is represented by 
not more than one restriction product. An adaptor is ligated to facilitate subsequent 
amplification of the captured population. PCR is carried out with one adaptor- 
specific and one biotinylated polydT primer. The reamplified population is 
recaptured and the non-biotinylated strands removed by alkaline dissociation. The 
non-biotinylaied strand is then resynthesized using a different adaptor-specific 
primer in the presence of a radiolabelled dNTP. The labelled immobilized 3' cDNA 
ends arc next sequentially treated with a series of different restriction endonucleases 
and the products from each digestion analysed by PAGE. The result is a fingerprint 
composed of a number of ladders ^equal to the number of sequential digests used). 
By comparing test versus control fingerprints, it is possible to identify differentially 
expressed products which can then be isolated from the gel and cloned. The 
advantages of this procedure are that it is very robust and reproducible, and the 
authors estimate that 80-93% of cDNA molecules are involved in the final 
fingerprint. The disadvantage is that polyacrylamide gels can rarely resolve more 
than 30(M00 bandsr which compares poorly to~ the 1000 or more which arc 
cstiniatcd to be produced in* an average experiment. The use of 2-D gels such as 
those described by Uitterlindcn era/. (1989) and Hatada et a/. (1991) may help to 
overcome this problem. 

A similar method for displaying restriction endonuclease fragments was later 
described, by Pra5har_and_'.We»sman (1 996): Howeve r, instead of sequential 
digestion of the immobolized S'^terminal^cDNAffagments, these authors simply 
compared the profiles ot the control and -treaterfn^opiilations without further 
. ^manipulation. 
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DNA arrays 

'Open' differential display systems are cumbersome in that it takes a great deal 
of time to extract and identify candidate genes and then confirm that thev are indeed 
up. or down- regulated in the treated compared to the control tissue. Normally, the 
latter process is carried out using Northern blotting or RT-PCR. Even so. each of 
the aforementioned steps produce a bottleneck to the ultimate goal of rapid analvsis 
of gene expression. These problems will likely be addressed bv the development of 
so-called DNA arrays (e.g. Gress et al. 1992, Zhao et al. 1995. Schena et al. 1996) 
the introduction of which has signalled the next era in differential gene expressiori 
analysis. DNA arrays consist of a' gndded membrane or glass chips' containing 
hundreds or thousands of DNA spots, each consisting of multiple copies of part of 
a known gene. The genes are often selected based on previouslv proven involvement 
m oncogenesis, cell cycling. DNA repair, development and other cellular processes 
They are usually chosen to be as specific as possible for each gene and animal species 
Human and mouse arrays are already commercially available and a few companies 
will construct a personalized array to order, for example Clontech Laboratories and 
Research Genetics Inc. The technique is rapid in that hundreds or even thousands 
of genes can be spotted on a single array, and that mRNA/cDNA from the test 
populations can be labelled and used directly as probe. When analvsed with 
appropriate hardware and software, arrays offer a rapid and quantitative means to 
assess differences m gene expression bet^veen two cell populations. Of course, there 
can only be identification and quantitation of those genes which are in the array 
(hence the term 'closed' system). Therefore, one approach to elucidating the 
molecular mechanisms involved in a particular disease/development svstem mav be 
to combine an open and closed system— a DNA array to directlv identify and 
quantiiate the expression of known genes in mRNA populations.' and an' open 
system such as SSH to isolate unknown genes which are differentially expressed. 

One of the main advantages of DNA arrays is the huge number of gene fragments 
which can be put on a membrane— some companies have reported gridding up to 
60000 spots on a single glass 'chip' (microscope slide). These high density chip- 
based micro-arrays will probably become available as mass-produced off-the-shelf 
Items in the near future. This should facilitate the more rapid determination of 
differential expression in time and dose-response experiments. Aside from their 
high cost and the technical complexities involved in producing and probing DNA 
arrays, the mam problem which remams. especially xvith the newer micro-arrav 
(gene-chip) technologies, is that results are often not whoUy reproducible between 
arrays. However, this problem is being addressed and should be resolved within the 
next few years. 



EST daubases as a means to identify diOerentiaUy.ezpressed genes 

Expressed sequence tags (ESTs) are partial sequences of clones obtained from 
cDNA libraries. Even though most ESTs have no formal identity (putative 
idennfication is the best to be hoped for), they have proven to be a rapid and efficient 
means of discovering new genes and can be- used to generate profiles of gene- 
expression in specific cells. Since they-were first described by .Adams et al. (1991), 
there has beena huge explosion in EST production and it is estimated that there ar^ 
now well over a million such sequences in the public domain, representing over half 
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of all human genes (Hilher et al. 1996). This large number or tVeeiy available 
sequences (both sequence information and clones are normally available royalr> -free 
from the originators) has enabled the development of a new approach towards 
differential gene expression analysis as described by Vasmatzis et al. (1998). The 
approach is simple in theory: EST databases are first searched for genes that have a 
number of related EST sequences from the target tissue of choice, but none or few 
from non-target tissue libraries. Programmes to assist in the assembly of such sets of 
overlapping data may be developed in-house or obtained privately or from the 
internet. For example, the Institute for Genomic Research iTIGR. found at 
http://www.tigr.org) provides .many software tools free of charge to the scientific 
communir\'. Included amongst these is the TIGR assembler (Sutton et al. 1995). a 
tool for the assembly of large sets of overlapping data such as ESTs, bacterial 
anificial chromosomes (BAC)s. or small genomes. Candidate EST clones repre- 
senting different genes are then analysed using RNA blot methods for size and tissue 
specificir>- and, if required, used as probes to isolate and identify the full length 
cDNA clone for further characterization. In practice however, the method is rather 
more involved, requiring bioinformatic and computer analysis coupled with 
confirmatory molecular studies. Vasmatzis et al. (1998) have described several 
problems in this fledgling approach, such as separating highly homologous 
sequences derived from different genes and an overemphasis of specificity for some 
EST sequences. However, sipce these problems will largely be addressed by the 
development of more suitable computer algorithms and an increased completeness 
of the EST database, it is likely that this approach to identifying differentially 
expressed genes may enjoy more patronage in the future. 



Problems and potential of differential expression techniques 
The holistic or single cell approach ? 

When working with in vivo models of differential expression, one of the first 
issues to consider must be the presence of multiple cell types m any given specimen. 
For example, a liver sample is likely to contain not only hepatocytes, but also 
(potentially) Ito cells, bile ductule cells, endothelial cells, various immune cells (e.g. 
lymphocytes, macrophages and Kupffer cells) and fibroblasts. Other tissues will 
each nave their own distinctive ceil popuianons. Also, in the case of neoplastic tissue, 
there are almost always normal, h\"perplastic and /or dyspiastic cells present in a 
sample. One must, therefore, be aware that genes obtained from a differential 
display experiment performed on an animal tissue model may not necessarily arise 
exclusively from the intended * target* cells, e.g. hepatocytes/neoplastic cells. If 
appropriate, funher analyses using immunohistochemistry, in situ hybridization or 
in situ RT-PCR should be used to confirm which cell types are expressing the 
gene(s) of interest. This problem is probably most acute for those studying the 
differential expression of genes in the-dtfvehypmenrof different cell types, where 
there is a need to examine homologous cell populations. The problem is now being 
addressed at the National Cancer InstitBte (Bethesda. MD, USA) where new micro- 
disection techniques have been employed to assist in their gene analysis programme, 
the C a nce r Genome Anatomy Project (CGAP) IFor more information see web site : 
http ://www.ncbi.nlm.nih.gov/ncicgap/intro.html). TTiere are also separation tcch- 
"mques available that utilise cell-specific ahtigens~as a means to isolate target cells. 
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e.g. fluorescence activated cell sorting (FACS) (Dunbar et al. 1998. Kas-Deelen « 
al. 1998) and magnetic bead technolog>- (Richard et al. 1998. Rogler et al. 1998). 

However, those taking a holistic approach may consider this issue unimponant. 
There is an equally appropriate view that all those genes showing altered expression 
withm a compromized tissue should be taken into consideration, .\ftef all. since all 
tissues are complex mixes of different, interacting cell npes which intimately 
regulate each other's growth and development, it is clear that each cell t^•pe could m 
some way contribute (positively or negatively) towards the molecular mechanisms 
which he behind responses to external stimuli or neoplastic growth. It is perhaps 
then more mformative to carry out differential display experiments using in vivo as 
opposed to m vitro models, where umform populations of identical cells probablv 
represent a panial. skewed or even inaccurate picture of the molecular changes that 
occur 



The incidence and possible implications of inter-individual biological variation 
should be considered in any approach where whole animal models are being used It 
is clear that individuals (humans and animals) respond in different wavs to identical 
stimuli. One of the best characterized examples is the debrisoquine oxidation 
polymorphism, which is mediated by cytochrome CYP2D6 and determines the 
pharmacokmetics of many commonly prescribed drugs (Lennard 1993. Meyer and 
Zanger 1997). The reasons for such differences are varied and complex, but allelic 
variations, regulatory region polymorphisms and even phvsical and mental health 
can all contribute to obser>ed differences in individual responses. Careful thought 
should, therefore, be given to the specific objectives of the studv and to the possible 
value of pooling starting material (tissue/mRNA). The effect of this can be 
beneficial through the ironing out of exaggerated responses and unimportant minor 
fluctuations of (mechanistically) irrelevant genes in individual animals, thus 
providing a dearer overall picture of the general molecular mechanisms of the 
response. However, at the same time such minor variations mav be of utmost 
importance m deciding the ability of individual animals to succumb to or resist the 
effects of a given chemical/disease. 



Hotc efficient are differential expression teckniqius at recovering a high percentage of 
differ etuially expressed genes? 

A number of groups have produced experimental data suggesting that mam- 
mahan ceUs produce between 8000-15000 different mRNA species at anv one time 
(Mechler and Rabbitts 1981. Hedrick et al. 1984. Bravo 1990), although figures as 
high as 20-30000 have also been quoted (Axel et al. 1976). Hedrick et al. (1984) 
provided evidence suggesting that the majority of these belong to the rare abundance 
class. A breakdown of this abundance distribution is shown in table 1. 
. _\aa»en.thc results of differemiaWispUy-experinaefKs have been compared with 
data obtained previously using other methods, it is apparent that not all differentiaUy 
expressed mRNAs are represented in the final display. In particular, rare messages 
(which. imporianUy, often include regulatory proteins) are not easily recovered 
usmg differential display systems. This is amajor shortcoming, as the majority of 
mRNA species exist at levels of less than 0.005 ^-of the' toliTpopulation (table 1). 
Bemoli -ei-a/. (199S) examined-the «ffieteney-Df-£H> templates (heterogeneous 
mRNA populations) for recovering rare messages and were unable to detect mRNA 



Differential gene expression 



0. 



1998, Kas-Deelcn et 
. Rogier et aL 1998). 
his issue unimportanr, 
ing altered expression 
ion. After all, since ail 
pes which intimately 
each ceil ry-pe could m 
nolecular nnechanisms 
growth. It is perhaps 
iments using in vivo as 
ientical cells probably 
nolecular changes that 

lal biological variation 
•dels are being used. It 
ereni ways to identical 
ebrisoquine oxidation 
6 and determines the 
aard 1993, Meyer and 
d complex, but allelic 
cal and mental health 
•nses. Careful thought 
idy and to the possible 
effect of this can be 
id unimportant minor 
vidual animals » thus 
ir mechanisms of the 
is may be of utmost 
ccumb to or resist the 



' a high percentage of 

uegestmg that mam- 
pecies at any one time 
)), although figures as 
Hednck et aL (1984) 
to the rare abundance 
n table 1. 

been compared with 
at not all difTcrentially 
ticular, rare messages 

not easily recovered 
ng, as the majority of 
. population (table 1), 
)lates (heterogeneous 
lable to detect mRNA 



species present at less than 1.2 °o of the total mRNA population — equivalent to an 
intermediate or abundant species. Interestingly, when simple model systems isingie 
target only) were used instead of a heterogeneous mRNA population, the same 
primers could detect levels of target mRNA down to 10000 x smaller. These results 
are probably best explained by competition for substrates from the many PCR 
products produced in a OD reaction. 

The numbers of differentially expressed mRNAs reported m the literature using 
various model systems provides funher evidence that many differentially expressed 
mRNAs are not recovered. For example. DeRisi et aL (1997) used DNA arrav 
technology to examine gene expression in yeast following exhaustion of sugar in the 
medium, and found that more than 1700 genes showed a change in expression of at 
least 2-fold. In light of such a finding, it would not be unreasonable to suggest that 
of the 8000-1 5 000 different mRNA species produced by any given mammalian cell, 
up to 1000 or more may show altered expression following chemical stimulation. 
Whilst this may be an extreme figure, it is known that at least 100 genes are 
activated/upregulated in Jurkai (T.) cells following IL-2 stimulation (Ullman et aL 
1990). In addition. Wan et aL (1996) estimated that interferon-y-stimulated HeLa 
cells differentially express up to 433 genes (assuming 24000 distinct mRNAs 
expressed by the cells). However, there have been few publications documenting 
anywhere near the recovery of these numbers. For example, in using DD to compare 
normal and regenerating mouse liver, Bauer et aL (1993) found only 70 of 38000 
total bands to be different. Of these. 50°'o (35 genes) were shown to correspond to 
differentially expressed bands. Chen et aL (1996) reponed 10 genes upregulated in 
female rat liver following ethinyl estradiol treatment. McKenzie and Drake (1997) 
identified 14 different gene products whose expression was altered by phorbol 
myristatc acetate (PMA, a tumour promoter agent) stimulation of a human 
myelomonocytic cell line. Kilty and Vickers (1997) identified 10 different gene 
products whose expression was upregulated in the peripheral blood leukocytes of 
allergic disease sufferers. Linskens et aL (1995) found 23 genes differentially 
expressed between young and senescent fibroblasts. Techniques other than DD 
have also provided an apparent paucity of differentially expressed genes. Using SH 
for example, Cao et aL (1997) found 15 genes differentially expressed in colorectal 
cancer compared to normal mucosal epithelium. Fitzpairick et aL (1995) isolated 1 7 
genes upregulated in rat liver following treatment with the peroxisome proliferaior. 
clofibrare: Philips et aL (1990) isolated 12 cDNA clones which were upregulated m 
highly metastatic mammar>' adenocarcinoma cell lines compared to poorly meta- 
static ones. Prashar and Weissman (1996) used 3' restriction fragment analysis and 
identified approximately 40 genes showing altered expression within 4 h of 
activation of Jurkat T-cells. Groenink and Lecgwater (1996) analysed 27 gene 
fragments isolated using SSH of delayed early response phase of liver regeneration 
and found only 12 to be upregulated. 

In the laboratory, SSH was used to isolate up to 70 candidate genes which appear 
to show altered expression in guinea pig liver following short-term treatment with 
the peroxisome prolifcrator, WY-1 4,643 (Rockett, Swales* Esdailc and Gibson, 
unpublished observations). However, these findings have still to be confirmed by 
analysis of the extracted tissue mRNA for differential expression of these sequences. 
■ Whilst the latest differential disptaytechnolog tgff P e pu rported to include design 
_ and cxEtrimcntal niodifications tP ovcrcomeibi&J^.Qteffisuency (in both the total 
number of differentially expressed genes recovered and the percentage that are true 
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pos.nves). ,t is still not clear if such adaptat.ons are pracncallv etfectue-nrovi 
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experiments and animals. DD^ on the other hand, is not subject ro this ijrey 
zone smce, unlike SH approaches, it does not amplify the difference in expression 
between two samples. Wan et al. (1996) reponed that differences in expression of 
twofold or more are detectable using DD. 



Resolution and visualization of differential expression products 

It seems highly improbable with current technology- that a gel system could be 
developed that is able to resolve all gene species showing altered expression m anv 
given test system (be it SH- o'r DD-based). Polyacr>lamide gel electrophoresis 

and are 



(PAGE) can resolve size difTerences down to 0.2 °o (Sambrook et al. 1989) 
used as standard in DD experiments. Even so, it is clear that a complex series of gene 
products such as those seen in a DD will coniam unresolvablc components. Thus, 
what appears to be one band in a gel may in fact turn out to be several. Indeed, it has 
been well documented (Mathieu-Daude et al. 1996. Smith et aL 1997) that a single 
band extracted from a DD often represents a composite of heterogeneous products, 
andjhc same has been found for SSH displays in this laboratory (Rockett et al. 
1997). One possible solution was offered by Mathieu-Daude et al. (1996),. who 
extracted and reamplified candidate bands from a DD display and used single strand 
conformation polymorphism (SSCP) analysis to confirm which components 
represented the truly differentially expressed product. 

Many scientists often tr\- to avoid the use of PAGE where possible because it is 
technically more demanding than agarose gel electrophoresis (AGE). Unfortunately, 
high resolution agarose gels such as Metaphor (FMC, Lichfield. UK) and AquaPor 
HR (National Diagnostics, Hessle, UK), whilst easier to prepare and manipulate 
than PAGE, can only separate DNA sequences which differ in size by around 
1.5-2^0 (15-20 base pairs for a 1Kb fragment). Thus. SSH. RDA or other such 
products which differ in size by less than this amount are normally not resolvable. 
However, a simple technique does in fact exist for increasing the resolving power of 
AGE— the inclusion of HA-red (10-pheny! neutral red-PEG ligand) or HA-vellow 
(bisbenzamide-PEG ligand) (Hanse .\nahtik GmbH, Bremen. Germany) in a 
gel separates identical or closely sized products on base content. Spccificallv. 
HA-red and -yellow selectively bind to GC and AT DNA motifs, respcctivciv 
iWawcr et al. 1995, Hanse Anaiytik 1997, personal commumcationi. Since both 
HA-siains possess an overall positive charge, they migrate towards the cathode 
when an electric field is applied. This is in direct opposition to DNA. which 
is riegativcly charged and. therefore, migrates towards the anode. Thus, if two 
DNA clones are identical in size (as perceived on a standard high resolution 
agarose gel), but differ in AT/GC content, inclusion of a HA-dye in the gel 
will effectively retard the migration of one of the sequences compared to the 
other, effectively making it apparently larger and. thus, providing a means of 
differentiating between the two. The use of HA-red has been shown to resolve 
sequences with an AT variation of less than 1 % (Wawer et al. 1995). whilst Hanse 
Anaiytik have reponed that HA staining is so sensitive that in one case it was used 
to distinguish two 567bp sequences which^iffcred by only a single point mutation 
(Hanse Anaiytik 1996, personal communication). Therefore, if one wishes to check 
whether all the clones produced from a specific band in a differential display 
-e3tperiment-«re derived from the same gene s p eci e s, a small-amount of reamplified 
or digested clone can be run on a standard high resolution gel, and a second aliquot 
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F,jure 10 Discnnunation of clones of identical /nearly identical size using HA-red. Bands of decreasine 
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different gene species are represented. 

in a similar gel containing one of the H.A-stains. The standard gel should indicate 
any gross s.ze differences, whilst the HA-sta>ned gel should separate other^vise 
""T? DO- "^"'^"'^ "^^^^ according to their base content. Geisinger 

et al. (199/) reponed successful use of this approach for identifying DD-derived 
clones. Figure 10 shows such an experiment carried out in this laborator^• on clones 
obtamed from a band extracted from an SSH display. 

An alternative approach is to carr%- out a 2-0 analysis oi the differential displav 
products. In this approach, size-based separation is rirst carried out in a standard 
agarose gel. The gel slice containing the display is then extracted and incorporated 
m to a HA gel for resolution based on AT/GC content. 

Of course, one should always consider the possibility of there being different 
gene speaes which are the same size and have the same GC/AT content However 
cc'oo**** ""^"o'^able given some effort-again. one might use 

ij&CP, or perhaps a denaturing gradient gel electrophoresis (DGGE) or temperature 
gradient field electrophoresis (TGGE) approach to resolve the contents of a band 
either direaly on the extracted band (Suzuki « al. 1991) or on the reamplified 
product. *^ 

The requirement of some differential display techniques to visualize large 
number, of products (e.g. DD and GEF) can also present a problem in that, in terms 
of numbers, the resolution of PAGE rarely excelds 300-^00 bands. One approach to 

^l'^^^ th» might be to usrf-.D^s ,uclnirtta)ffe-aescribed by Uitterlinden et 
al. (1989) and Hatada et al. (1991), - — ... 
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Extraction of differentially expressed bands from a gel can be complex since, m 
some cases (e.g. DD, GEF), the results are visualized by autoradiographic means, 
such that precise overlay of the developed film on the gel must occur it the correct 
band is to be extracted for further analysis. Clearly, a misjudged extraction can 
account for many man-hours lost. This problem, and that of the use of radioisotopes, 
has been addressed by several groups. For example, -Lohmann et qL (1995) 
demonstrated that silver staining can be used directly to visualize DD bands m 
horizontal PAGs. An et al. (1996) avoided the use of radioisotopes by transferring a 
small amount (20-30°o) oi the DN.\ from their DD to a nylon membrane, and 
visualizing the bands using chemiluminescent staining before going back to extract 
the remaining DNA from the gel. Chen and Peck (1996) went one step further and 
transferred the entire DD to a nylon membrane. The DNA bands were then 
visualized using a digoxigenin (DIG) system (DIG was anached to the polydT 
primers used in the differential display procedure). Differentially expressed bands 
were cut from the membrane and the DNA eluted by washing with PCR buffer prior 
to reamplification. 

One of the advantages of using techniques such as SSH and RDA is that the final 
display can be run on an agarose gel and the bands visualized with simple ethidium 
bromide staining. Whilst this approach can provide acceptable results, overstaining 
with SYBR Green I or SYBR Gold nucleic acid stains (FMC) effectively enhances 
the intensity and sharpness of the bands. This greatly aids in their precise extraction 
and often reveals some faint products that may otherwise be overlooked. Whilst 
differential displays stained with SYBR Green I are better visualized using short 
wavelength UV (254 nm) rather than medium wavelength (306 nm), the shorter 
wavelength is much more DNA damaging. In practice, it takes only a few seconds 
to damage DNA extracted under 254 nm irradiation, effectively preventing 
reamplification and cloning. The best approach is to overstain with SYBR Green I 
and extract bands under a medium wavelength UV transillumination. 



The possible use of ' microfingerprinting ' to reduce complexity 

Given the sheer number of gene products and the possible complexit>' of each 
band, an alternative approach to rapid characterization may be to use an enhanced 
analysis of a small section of a differential display — a * sub- fingerprint' or * micro- 
fingerprint'. In this case, one couid concentrate on those bands which only appear 
in a particular chosen size region. Reducing the tmgerpnnt in this way has at least 
two advantages. One is that it should be possible to use different gel t>*pes, 
concentrations and run times tailored exactly to that region. Currently, one might 
run products from 1 00-3000 + bp on the same gel, which leads to compromize in the 
gel system being used and consequently to suboptimal resolution, both in terms of 
size and numbers, and can lead to problems in the accurate excision of individual' 
bands. Secondly, it may be possible to enhance resolution by using a 2-D analysis 
using a HA*stain» as described earlier. In summary, if a range of gene product sizes 
is carefully chosen to included certain * relevant * genes, the 2-D system standardized, 
and appropriate gene analysis used, it may be possible to develop a method for the 
early and rapid identification of compounds which have similar or widely different 
cellular effects. If the prognosis for exposure to one or more other chemicals which 
display a similar_prpfile is already. kn own , then one cou ld perhaps predict similar 
effects for any new compounds which show a similar micro- fingerprint. 
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An aiternanve approach to micronngerpnnrmg is to examine altered expression 
m specific famihes of genes through careful selection of PCR primers and/or post- 
reaction anaJys.s. Stress genes, growth factors and/or their receptors, cell cvclinR 
genes. c>tochrornes P450 and regulator>- proteins might be considered as candidates 
tjcDxV F I"'"'- off-^he-shelf DNA arrays (e.g. Clontech I 

.^tlas cDNA Expression Array series) already anticipated this to some degree bv 
groupmg together genes involved in different responses e.g. apoptos.s. stress. D\ A- 
damage response etc. 



Screening 

False positives 

d, Jr!)nrTw''",'°" ^^'^ P^'"'"" ^"^^ '^'"""'^^ " amongst the 

foorr ''"'"'"""''^■^^''"S"''^- ^9"- '995. Nish.o«a/. 1994.Sun eta/ 

echn,.tT-"' ''""^^ P°^"-« the 

been HPI c'^'^Tl T"""' °^ ^^ich have not 

been HPLC Punfied can lead to the production of false positives through illegitimate 

PCRanffir ''"^'^'^ ^^^>- '^-"h 

PCR artifacts and . lleg.temate transcription of rRNA. In SH. false positives appear 

cU.N A/mRNA spec.es which do not undergo hybridization for technical reasons 

A quick screening of putative differentially expressed clones can be carried out 
using a simple dot blot approach, in which labelled first strand probes svnthesized 
l/To/n T J" "^^^^ hybridized to an array of said clones ( Hedrick ei 
tr k"^^'^' 'i «P-ssed clones wiU hvbridize to 

tester probe, but riot driver. The disadvantage of this approach is that rare species 
may not generate detectable hybridization signals. One option for those using SSH 
IS to screen the clones using a labelled probe generated from the subtracted cDNA 
from which It was derived, and with a probe made from the reverse subtraction 

It should be possible to confirm the presence of clones representing low abundance 
genes. this quick screening step, there is still the need to go back to the 

ongmal mR>rA and confirm the altered expression using a more quantitative 
approach. Although this may be achieved usmg Northern blots, the sensitivirv is 
poor by today s high standards and one must rely on PCR methods for accurate and 
sensitive determinations (see below). 



Sequence analysis 

The majority of differential display procedures produce final products which are 
between 100 and lOOObp in size. However, this may considerably reduce the size of 
the sequence for analysis of the DNA databases. This in xum leads to a reduced 
confidence in the result-several families of genes have members whose DNA 
-w^ences ar^ almosr-i dtnduj t xt L pi.ui i few key stretches; e.g. the cytochrome 
P4S0 gene superfamily (Nelson et alA996). Thus, does the clone identified as being 
almost Identical to gene X, really coriie from that gene, or its brother gene X, or iu 
as yet undiscovered sister X, ? FbTexample. irsing SSH; pirt of a gene was isolated 



Differential gene expression 



mine altered expression 
R primers and/or post- 
: receptors, cell cycling 
onsidered as candidates 
arrays (e.g. Clontech's 
this to some degree by 
ipoptosis, stress. DNA- 



at length amongst the 
wo et at. 1994, SxinetaL 
sitives varies with the 
laptors which have not 
ves through illegitimate 
they can arise through 
f, false positives appear 
^ some may arise from 
1 for technical reasons, 
ones can be carried out 
and probes synthesized 
said clones (Hedrick et 
lones will hybridize to 
•ach is that rare species 
)n for those using SSH 
1 the subtracted cDNA 
he reverse subtraction 
nriches rare sequences, 
senting low abundance 
-leed to go back to the 
z a more quantitative 
3lots. the scnsirivin- is 
^thods for accurate and 



nal products which are 
rably reduce the size of 
um leads to a reduced 
nembers whose DNA 
3, e.g. the cytochrome - 
lone identified as being 
; brother gene or its 
of a gene was isolated, 



which was up-regulated m the liver of rats exposed to Wy. 14.643 and was identined 
by a FASTA search as being rransferrm (data not shown). However, transierrm is 
known to be downregulated by hypolipidemic peroxisome proliicrators such as \Vv. 
14.643 (Hertz et aL 1996), and this was confirmed with subsequent RT-PCR 
analysis. This suggests that the gene sequence isolated mav belong to a gene which 
is closely related to transfernn. but is regulated by a dirTerent mechanism. 

A funher problem associated with 5H technology is redundancv. In most cases 
before SH is earned out, the cDXA population must first be simplified bv restriction 
digestion. This is important for.at least two reasons: 

(1) To reduce complexity— long cDNA fragments may form complex networks 
which prevent the formation of appropriate hybrids, especially at the high 
concentrations required for efficient hybridization. 

(2) Cutting the cDN.As into small fragments provides better representation of 
individual genes. This is because genes derived from related but distinct 
members of gene families often have similar coding sequences that mav cross- 
hybridize and be eliminated during the subtraction procedure (Ko 1990). 
Furthermore, different fragments from the same cDNA may differ considerablv 
m terms of hybridization and amplification and. thus, may not efficicntlv do one 
or the other (Wang and Brown 1991). Thus, some fragments from differennallv 
expressed cDNAs may be eliminated during subtractive hybridization pro- 
cedures. However, other fragments may be enriched and isolated. As a 
consequence of this, some genes will be cut one or more times, giving rise to rwo 
or more fragments of different sizes. If those same genes are differentially 
expressed, then two or more of the different size fragments may come through 
as separate bands on the final differential display, increasing the observed 
redundancy and increasing the number of redundant sequencing reactions. 

Sequence comparisons also throw up another important point— at what degree 
of sequence similarity does one accept a result. Is 90% identitiy between a gene 
derived from your model species and another acceptably close? Is 95% between 
your sequence and one from the same species also acceptable.' This problem is 
particularly relevant when the forward and reverse sequence comparisons give 
similar sequences with completely different gene species! An arbitran- decision 
seems to be to allocate genes that are deiinite (95 «o and above similannM and then 
group those between 60 and 95 % as bemg related or possible homoloeues. 

Quantitative analysis 

At some point, one must give consideration to the quantitative analysis of the 
candidate genes, either as a means of confirming that they are truly differentially 
expressed, or in order to establish just what the differences are. Northern blot 
analysis is a popular approach as it is relatively easy and quick to perform. However, 
the major drawback with Northern blots is that they are often not sensitive enough 
to detect rare sequences. Since the majority of messages expressed in a cell arc of low 
abundance (see table 1 ). this is a major problem. Consequently, RT-PCR may be the 
method of choice- for eonfirmmg- differenti a e xpi e ssiuu. A lthough the procedure is 
somewhat more complex than Nonhem analysis, requiring synthesis of primers and 
optimization of reaction condkioris for each gene species, it is now possible to set up 
high throughput PGR sysiems~usihg mulitchannel pipettes, 96 + -well plates and 
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appropriate thermal cycling technology. Whilst quantitative analysis is more 
desirable, being more accurate and without reliance on an internal standard, the 
money and time needed to develop a competitor molecule is often excessive, 
especially when one might be examining tens or even hundreds of gene species: The 
use of semi-quaniitative analysis is simpler, although still relati% ely involved. One 
must first of all choose an internal standard that does not change m the test cells 
compared to the controls. Numerous reference genes have been tried in the past, for 
example interferon-gamma (IFN-7. Frye et al. 1989). ^-actin (Heuval et al. 1994» 
glyceraldehydeO-phosphate dehydrogenase (G.APDH, Wong et al. 1994) di- 
hydrofolate reductase (DHFR. Mohler and Butler 1991), /?-2-microelobulin [0.2. 
m. Murphy et al. 1990). hypoxanthine phosphoribosyl transferase ( HPRT, Foss et 
al. 1998) and a number of others (ClonTechniques 1997b). Ideally, an internal 
standard should not change its level of expression in the cell regardless of cell age 
stage in the cell cycle or through the effects of external stimuli. However, it has beeii 
shown on numerous occasions that the levels of most housekeeping genes currently 
used by the research community do in fact change under certain conditions and in 
different tissues (ClonTechniques 1997b). It is imperative, therefore, that pre- 
liminary experiments be earned out on a panel of housekeeping genes to establish 
their suitability' for use in the model system. 

Interpretation of quantitative data must also be treated with caution. By 
comparing the lists of genes identified by differential expression one can perhaps 
gam insight into why two different species react in different ways to external stimuli. 
For example, rats and mice appear sensitive to the non-genotoxic effects of a wide 
range of peroxisome proliferators whilst Syrian hamsters and guinea pigs are largely 
resistant (Orton et al. 1984. Rodricks and TumbuU 1987. Lake et al. 1989 1993 
Makowska et al. 1992). A simplified approach to resolving the reason(s) why is to 
compare lists of up- and down-regulated genes in order to identify those which are 
expressed m only one species and. through background knowledge of the effects of 
the said gene, might suggest a mechanism of facilitated non-genotoxic carcinogenesis 
or protection. Of course, the situation is likely to be far more complex. Perhaps if 
there were one key gene protecting guinea pig from non-genotoxic effects and it was 
upregulated SO times by PPs. the same gene might only be up-regulated five times 
in the rat. However, since both were noted to be upregulated. the imoonance of the 
gene mzy be overlooked. Just to complicate maners. a iaree change in expression 
does not necessarily mean a biologically imponant change. For example, what is the 
true relevance of gene Y which shows a 50-fold increase after a particular treatment, 
and gene Z which shows only a 5-fold increase.' If one examines the literature one 
may find that historically, gene Y has often beeii shown to be up-regulated 40-60- 
fold by a number of unrelated stimuli— in liglit o? this the SO-fold increase would 
appear less significant. However, the literature may show that gene 2 has never been 
recorded as having more than doubled in expression— which makes your S-fold 
increase all the more exciting. Perhaps even more interesting is if that same S-fold 
mcrease has only been seen in related neoprasms or following treatment with related 
chemicals. 



Problems in using' the difierVntiar display approach 

Differential dispUy technology originally held promise of an easily obtainable 
' fingerprint ' of those genes which are up- or down-regulated in test animals/cells in 
a developmental process or following exposure to given stimuli. However, it has 
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become clear that the fingerprinting process, whilst still valid, is much too complex 
to be represented by a single technique profile. This is because all differential display 
techniques have common and/or unique technical problems which preclude the 
isolation and identification of all those genes which show changes m expression. 
Furthermore, there are important genetic changes related to disease development 
which differential expression analysis is simply not designed to address. An example 
of this is the presence of small deletions, insertions, or point mutations such as those 
seen in activated oncogenes, tumour suppressor genes and individual poly- 
morphisms. Polymorphic variations, small though they usually are, are often 
regarded as being of paramount importance in explaining why some patients 
respond bener than others to certain drug treatments (and. in logical extension, why 
some people are less affected by potentially dangerous xenobiotics/ carcinogens than 
others). The identification of such point mutations and naturally occurring 
polymorphisms requires the subsequent application of sequencing, SSCP, DGGE 
or TGGE to the gene of interest. Furthermore, differential display is not designed 
to address issues such as alternatively spliced gene species or whether an increased 
abundance of mRNA is a result of increased transcription or increased mRNA 
stabilitv. 
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Conclusions 

Perhaps the main advantage of open system differential display techniques is that 
they are not limited by extant theories or researcher bias in revealing genes which arc 
differentially expressed, since they are designed to amplify all genes which 
demonstrate altered expression. This means that they are useful for the isolation of 
previously unknown genes which may turn out be useful biomarkers of a particular 
state or condition. At least one open system (SAGE) is also quantitative, thus 
elimmating the need to return to the original mRNA and carry out Northern/ PGR 
analysis to confirm the result. However, the rapid progress of genome mapping 
projects means that over the next 5-10 years or so. the balance of experimental use 
will switch from open to closed differential display systems, particularly DNA 
arrays. Arrays are easier and faster to prepare and use, provide quantitative data, arc 
suitable for high throughput analysis and can be tailored to look at specific signalling 
pathways or families of genes. Identification of all the gene sequences in human and 
common laboratory animals combined with improved DNA array technology, 
means that it will soon no longer be necessary to try to isolate differentially expressed 
genes using the technically more demanding open system approach. Thus, their 
. jnain advantage (that of identifying unknown genes) will be largely eradicated. It is 
likely, therefore, that their sphere of application will be reduced to analysis of the 
less common laboratory species, since it will be some time yet before the genomes of 
such animab as zebrafish, elecftric eels, gerbils, crayfish and squid, for example, will 
be sequenced. . . - 

Of course, in the end the question will always remain: What is the functional/ 
biological significance of the identified, differentially expressed genes? One 
persistent problem is understanding whether differentially expressed genes arc a 
cause or consequence of the altered state. Furthermore, many chemicals, such as 
non-genotoxic carcinogens, are also mitogens and so genes associated with 
replication will also be upregulated but may have little or nothing to do with the 
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carcmogenic effect. Whilst differential display technolog^• cannot hope to answer 
these quesnoris. .t does provide a springboard from which ident.ncation. regulatorx- 
and functional studies can be launched. Understanding the molecular mechSiism of 
cellular responses is almost impossible without knowing the regulation and function 
hLu" '""""d). In an abstract sense, differential 

display can be likened to a still photograph, showing details of a nxed moment m 
tirne. Consider the Historian who knows the outcome of a banle and the placement 
and condition of the troops before the battle commenced, but is asked to tr^• and 
deduce how the banle progressed and why it ended as it did from a few still 
photographs-an impossible task. In order to understand the battle, the Historian 
must find out the capabilities and motivation of the soldiers and their commanding 
officers, what the orders were and whether they were obeyed. He must examine the 

'ZTr- "'"Tt"^ 'he prevaihng weather 

conditions exerted Likewise, if mechanistic answers are to be fonhcoLng the 
scientist must use differential display in combination with other techniques. s!;h a! 

^ZtZT: mutation'anaWs'and 
time and dose response analyses. Although this review has emphasized the 

he full itnpact of this approach will be strengthened if used in combination with 
functional genomics and proteomics (2-dimensional protein gels from iso^ectric 
focusing and subsequent SDS electrophoresis and virtual 2D-maps usmg cap Har^ 
el^rophoresis). Proteomics is anracting much recent attention as m^y of h^ 

nrotl K ^ 7«"«vely herein, but rather protein-protein. protein-DNA and 
protein phosphorylation events which would require functional genomics or 
proteomic technologies for investigation. 

DotenJ^ rn^rJ;"''"""!!^^^ differential display technology, it is clear that manv 
potential applications and benefits can be obtained from characterizing the genetic 
changes that occur in a cell during normal and disease development and in r«p^n« 
to chemical or biological insult. In light of functional data, such profiling 

t'ershVuld^*r"T' 1 °' development or response, and in the I^ng 
«nn should help m the elucidation of specific and sensitive biomarkers for differem 
n^es of chenrucal/biological exposure and disease states. The potential medical and 
d^erapeunc benents of understanding such molecular changes are almost im- 
m^ble Amongst other thmgs. such .ingerprmts could indicate the familv or 

^tllr^rJT" 1 f"'"'"'^ has been exposed to plus the length 
and/or acuteness of that exposure, thus indicating the most prudent tream„St. 
They ouy also help uncover differences in histologically identical cancers, provide 
d^agnosttc tests for the earliest stages of neoplasia and. again, perhaps indiaite the 
most efficacious treatment. — f «»» me 

The Human Genome Project will be completed early in the next century and the 
DNA sequence of d the human gene, will be known. The continuing devdopment 

knowledge contributes fully to the understanding of human disease processes. 
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The avaiiability of genome-scale DNA sequence information and reagents has-radically altered life-science 
'""7t. '° '° development of a new scientific Lbdiscipline derived from comb.na 

t,=n of the fields of toxicology and genomics. This subdiscipline. termed toxicogenomics. is concerned w^J "he 
Tufe oToT„°' "-cants, and their putat.ve mechanisms of aa.on ^h ough 

ThI «or.« on 'T.h''"- ^'''^ """°^^^3y^ °^ '^^'Py- Which allow the monitonng of 

the expression levels of thousands of genes simultaneously. Here we propose a general method by which q?ne 
expression, as measured by cDNA microarrays. can be used as a highly sensitive and informative markeffo^ 
toxicity Our purpose .s to acquaint the reader with the development and current state of microarray techno" 

- Key words: toxicology; gene expression; animal bioassay 

INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases (1), are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2|. The 
'irst complete sequence of a free-living organism. 
Haemophilus influenzae, was reported in 1995 [3| and 
was followed shortly thereafter by the first complete 
sequence of a eukar\'ote, Sacchawmyces cervisiae (4|. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion or the Homo sapiens DNA sequence is not tar 
bemnd ;5I. 

To expioirmore ruilv the wealth or new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
nonhern blotting, RNAse proteaion assavs. SI nu- 
clease analysis, plaque hybridization, and' slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display (6), high^ensity filter 
hybridization (7,8). serial analysis of gene expression 
[9), and cDNA- and oligonucleoride-based microarray 
"chip " hybridization (10-12) are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicit>% as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited bv a given 
toxicant. Microarray technology- offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicolog)' testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 
cDNA Microarrays 

In the past several years, numerous svstems were 
developed for the construction of lar«e-scale DNA 
arravs. Ali'of these piatrorms are oasea on cDNAs 
or oiigonucieotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotring onto a glass substrate 
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[13.14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in smgle-round reverse-tran- 
scription reaaions in the presence of fluorescentiv 
ragged dLTP (e.g.. Cy3-dUTP and CyS-dUTP). which 
produces control and test products labeled with dif- 
ferent rluors. The cDNAs generated from these two 
populations, colleaively termed the "probe." are then 
mixed and hybridized to the array under a glass cov- 
erslip (10.11,15). The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for tluor excitation ( 10. 1 1, 15). The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feamre the ratio of tluor 1 to 
tluor 2. corrected for local background (16,17). The 
strength of this approach lies m the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA (15), tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells (18), yeast RNA [19], and human in- 
flammatory disease-related genes [20). The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast SiKcharomyces cenisiae (21). 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower densiry l".22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of senomic or cDNA clones (7^2-24). In 
expression prohiing on niter membranes, two dir- 
rerent membranes are used simultaneously ror con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioaaive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
(25-27). 

Oligonucleotide Microarrays 

Oligonucleotide miaoarrays are construaed either 
by spotting prefabricated ol'igos on a glass support 
(13) or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy (28-30). The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing bv hybnd 
ization as well as gene-expression anaivsis. 

fabrication of oligonucleotide chips by photoij. 
thographv is theoreticailv simple but technically 
complex [29.30]. The light from a high-intensit\' 
mercun.' lamp is directed through a photolitho. 
graphic mask onto the silica surface, resulting m 
deprotection of the terminal nucleotides in the illu- - 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(Where n = oligonucleotide length m bases) to syn- 
thesize a vast number of unique oiigos. the total num- 
ber of which is limited only by the complexir\- of the 
photolithographic mask and the chip size [29.3 1 .32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular polyiAj* RNa 
followed by antisense RNA synthesis in an in vitro 
transcription reaaion with biotinvlated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirea 
visualization method is used, the chips are incubated 
with tluor-linked streptavidin ie.g.. phycoerythrin) 
after hybridization ( 12.33) . The signal is deteaed with . 
a custom confocal scanner (34|. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28.361, and to evolutionary* sequence com- 
parison of the BRCAl gene |37). In addition, 
mutations in the cystic fibrosis [38] and BRCAl (39) 
gene products and polymorphisms in the human im- 
munodeficiency virus- 1 clade B protease gene {40| 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring !?3| 
as has been demonstrated by the simultaneous evalu- 
ation of gene-e.xpression patterns in nearly all open 
reading frames of the yeast strain S. itrtrxwae 112). 
More recently, oligonucleotide chips have been used 
to help identifv' single nucleotide polymorphisms in 
the human Ul] and veast |42| genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Aaion 

The field of toxicology uses numerous in vivo 
model systems, including the rat. mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by. and results in. alterations in gene e.xpression. In 
many cases, these changes in gene e.xpression are ^ 
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far more sensitive, characteristic, and measurable 
endpoini than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
isn'. after toxicant exposure is fundamentally infor- 
'rTi.::ive and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototy-pic class le.g.. polycyclic aromatic hydrocar- 
b- is (PAHs)). Cells are then treated with these agents 
a: J fixed toxicity level las measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip ( Figure 1 1. We have developed a cus- 
■ tom DNA chip, called ToxChip vLO, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the lest 
aeents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
.nts. termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on reia- 

Control 
Population 
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tive fold induction or suppression of genes m treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sampie 
set. A different signature may be established ror each 
proton-pic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of anion to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors. PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both P.AH-like and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set le.g.. thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2. we developed 
the custom cDNA microarray chip To.xChip vl.O, 
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Figure 1 . Simplified overview of the method for sample trative purposes, samples derived from cell culture are depiaed. 
preparation and hybridization to cDNA microarrays. For illus- although other sample types are amenable to this analysis. 
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J,rtr^^r. r,4 ^eofeseniation of the method for iden- 

tificat.on of a toxicant's mechanism of anion. In this method 
gene-expression data derived from exposure of model $«- 
tems to known toxicants are analyzed, and a set of chanoes 
charaaer.st,c tothat type of toxicant (termed the toxican 
signature) .s identified. As deoined. oxidant stressors produce 

The 2090 human genes that comprise this subarrav 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different t\-pes of toxic insult. Included 
on this list are DNA replication and repair genes 
apoptosis genes, and genes responsive to PAHs and 
dioxm-like compounds, peroxisome proliferators 
estrogenic compounds, and o.xidant stress. Some of 
the other categories of genes include transcription 
tactors. oncogenes, tumor suopressor aenes c^•cli^s 
kinases, pnosphatases. cell adhesion and motilit>- 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date 
veiy few toxicants have been shown to have appre^ 
ciable effeas on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scnption of some of the different classes of genes 
that comprise ToxChip v i .0. 

When a toxicant signature is determined, the 
h!? K "^"wv «e flagged within the 

database. When uncharaaerized toxicants are then 
screened the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



consistent changes in group A genes (indicated bv red and 

! "j:,)'\VierV°' ''""^ ' Senes (ind^ate'd bj/r^ 
circles). The set of gene-expression changes elicited bv thi 
suspeaed toxicant is then compared with fhesi haraneriSc 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing To.x- 
Chip v2.0 and chips tor other model svste-rs 
including rat. mouse. Xenopus. and veast. for use iti 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology communit>- relies heavilv on the 
use ot animals as model svstems tor to.xicology test- 
I nrorrunateiv. these assavs are mnerentlv e.x- 
pensive. require large numbers or animals and take a 
long .time to complete and analvze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS). the National Toxicology Program, and the 
toxicology community at large are committed to re- 
duang the number of animals used, bv developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, inununotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particulariy e.xpensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze (43). In vitro experiments of the type ouUined 
in Figure 2 might provide evidence that an unknown 



MICROARRAYS 

Table 1. ToxChip vl.O: A Human cDNA Microarray 
" chip Designed to Detea Responses to Toxic Insult 

No. of genes 



Genecateoorv or. cmo 



Apcotosis 72 

- OK - reoNcation and reoaif 99 

Oxioaiive .stress/re(j©*-BomeosTas(S 90 

Peroxisome oroiiferator resoonsive 22 

'Dioxin/?AH resoonsive 12 

Estrogen resoonsive 63 

House<eeDing 34 

Oncogenes and tumor suooressor genes 76 

Celt-cycle control 51 

Transcription factors 131 

"Kir^ases 276 

F-':onatases 88 

Hr-j>5nock proteins 23 

Receotors 349 

Cyrocnrome P450s 30 



•This list IS inien.aed as a general cuide. The gene categories are not 
uniQue. ano some genes are nsteo m muiiipie categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
sp=ect a bioassay more specifically suited to the agent 
i: question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use. and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
^ --eened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often e.xquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mais. in addition, sene-expression chances are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by^Ttudying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
With traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi-" 
colpgy and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 
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also be improved by the addition or microarrjv an Jh - 
sis. The combmation of microarravs with rradmona: 
bioassavs might also be useful for investigating some 
of the more intractable problems m toxicoiog\- re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species e.xtrapolation. 

Exposure Assessment. Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene e.xpression is a sensitive endpoint. gene expres- 
sion as measured with microarray technology mav 
be useful as a new biomarker to more preciseiv iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate tor occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For e.xampie. 
a pilot study of gene expression in peripheral-blood 
lvmphoc\tes of Polish coke-oven workers exposed 
:o PAHs tana manv otner compounds* is under con- 
sideranon arthe N'lEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way (44,45). 
However,, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental faaors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals and 
^.s var,ab.i,r>; can be a causative factor .n hur^Jn 
d seases of environmental origin (46.47). A newTrea 
of toxicol^ogy. termed toxicogenetics, was"e em v 
developed to study the relationship between gene c 
vanab.Mtv^d toxicant susceptiW.ty. This S is 
not the subject of this discussion, but it is wo«h 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arravs umqCelv 
useful for this type of analysis. Recent reports dem 
TwH^f '""'"'^ °^ ^^'^ approach (41 
l^oie fT. -^l T^'"^ Environmental Genome 
nhil ,1""^ sequence polvmor- 

phisms m 200 genes thought to be involved 'in en- 
•.ronrnental diseases (48|. In a pilot studv on the 
feasibility of this application to the Environmenta 
Genome Project, oligonucleotide arravs will bTused 

nZ'r" '° ""'"""^ This toxicogenetic 
7ef r?n^ *° '^"'""ically improve our un- 

surptib'r ' -^^ability in disease 

FUTURE PRIORITIES 
There are many issues that must be addressed be- 
fore the full potential of microarravs in toxicolo^ 

tu^e o <,?°"' ^""^ ^he temporal na. 

ture of gene expression. In other words in wh^ch 
spec.es. at what dose, and at what time do we 'ool 
for toxicant.induced gene expression? If human 
samples are analyzed, how variable is global Sne 
expression between individuals, before and after foT 
cant e.xposure' What are the effects of age. die and 
other factors on this expression? ExperiLce ' n t^e 

n^er tS?'" °^ ^o-cant 'exposure "wU 
answer these questions. 

One of the most pressing issues for arrav scientists 

Unked to the existing public databasesi to sen e as a 
repository- ror gene-express.on data. Thi weUtfonat 
database must be made available for public use and 
researchers must be encouraged to submit the/r ex 
pression data so that others may view a™d ouerl the" 

Of Si"- « the National ?nst^te 

of Health-have^nade laudable progress in develol 
mg the first generation of such a dafabls (44 
addition, improved statistical methods forVene chis 

lvre"thedt'*""".'^«"'"'>" neSeS ?o ant 
lyze the data in such a public database 

odi of different platforms and meth- 
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Which data Will be collected bv difierenf 
rieswinmakelarge-sc3,edataana,v^^^^ 
ficult To help circumvent these mture probi^- 
set of standards to be included on all nla-- ' ^ 
snould be established. These standards wouic • 
tate data entr^- into the national database anc ' 
as reference points for cross-platform and inter .V' 
ratory data analysis. 'n«r-.joo. 

Many issues remain to be resolved, but it is n., 
that new molecular techniques such as m c oat. 
hybndization will have a dramatic impact on tox'S' 
ogy research. In the future, the information gS 
rom microarray-based hybridization expenments t^S 
form the basis for an improved method toTss" s rh 
■mpaa Of Chemicals on human and env-oTA^^^^^^^ 
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DNA array tcchnolo^ makes it pouible to rapidly genotype indfviduais or quaotify the opreuton 
of choiisancis of genes on a single filter or gUu slide, and holds cnonnous potential in toxicologic 
applications. This potential led to a U^. Environmental Protection Agency-sponsored «vorkshop 
titled Application of Microamys to Toxicolo^** on T— 8 Januaxy 1999 in Research Triangie Park. 
North Carolina. In addition to providing state-of-the-art i/\formation on the application of DNA or 
gene microarrays. the workshop catalyzed the formation of several collaborations, committees, and 
user's groups throughout the Research Triangle Park area and beyond. Potential application of 
microarrays to toxicologic research and risk asMssment indude genome-vvide expression analvsei to 
identiiy gene*Gtpression networks and toxicant-specific signatures chat esm be used to define mode 
of action, for ocposure assessment, and for environmental monitoring. Amvs may also prove useful 
for monitoring genetic variability and its relationship to toxicant susceptibility in human popula- 
tions. Key wordr DNA arrays, gene arrays, microarrays, toxicology. Environ Health Perrpect 
107:681-685 (1999). (Online 6 July 1999] 
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Decoding the genetic blueprint is a dream chat 
offers manifold returns in terms of understand- 
ing how organisms develop and hincoon in an 
often hostile environment, ^'ith the rapid 
adxmccs in molccuiar biologv' over the last 30 
yean, the dream has come a step closer to real i- 
t\'. Molecular biologists now hav-c the ability to 
elucidate the composition of any genome. 
Indeed, almost 20 genomes have already been 
sequenced and more than 60 arc currently 
under way. Foremost among these is the 
Human Genome Mapping Project. However, 
the genomes of a number of commonly used 
laborarorv' species are also under intensive 
investigation, including yeast, Arabidopsis. 
maize, rice, zebra fish, mouse, rat, and dog. It 
is widely cxpeaed chat the completion of such 
programs will facilitate the development of 
many powernd new techniques and approach- 
:s to ciiasnosmg ano treating gcncucaliv and 
cnvnronmencaSy induced diseases which afnia 
mankind. However, the vast amount of data 
being generated by genome mapping will 
require new- high- throughput technologies to 
investigate the function of the millions of- new 
genes chat are being reported. Among the most 
widely heralded of the new functional 
genomics technologies are DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reaction (PCR). 

Arrays enable the study of literally thou- 
sands of genes in a single experiment. The 
potential importance of arrays is enormous and 
has been hi^di^ted by the recent publicatbn 
of an endre Nature Gmetia supplement dedi- 
cated to the technobg^ (/). Despite this huge 
surge of interest. DNA arrays arc still licde used 
and iargdy unproven, as demonstrated by die 
high racb of review and press arddes to jh^'^I 
data papers. Even so. che potential they offer 



has driven venture capitalists into a frenzy of 
investment and many new companies are 
springing up to daim a share of this rapidly 
devdoping markeL 

The U.S. Environmental Protection 
Agency (EPA) is interested in applying DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technology, che Rcproduaive Toxicolog)' 
Division (RTD) of the National Health and 
Environmental Effects Research Laboratory 
(NHEERL: Research Triangle Park, NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 Januan' 
1999 in Research Triangle Park. North 
Carolina. The workshop was organized by 
David Dix. Roben Kavlock. and John Rockctt 
of the RTD/NHEERL. Twenty-two intra- 
mural and extramuraJ sdentiso trom so^'em- 
menc. acadonia. and indusny shared inibrma- 
cion. data, and opinions on the current and 
fucure appucacions tor this ocdting new tech- 
nology. The workshop had more than 150 
anendees. induding researchers, students, and 
-admintscrators from the EPA, the-Nacional 
Institute of Environmental Health Sciences 
(NIEHS), and a number of other establish- 
ments from Research Triangle Park and 
beyond. Presentadoru ranged from the tech- 
nology behind array production chrough the 
sharing of actual otperimencal data and projec- 
tions on che future importance and applica- 
Qons of arrays. The infbrmauon contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
application to toxicoic^ in panicular. 

Array Elements 

In che CGntcsi cf 2no!ccu23r biology, usc word 
"array" is normally used to refer to a series of 
DNA or protein dements firmly attached in 



a regular pattern to somr kjnd of sucjpomvc 
medium. DNA arrav is often used inter- 
changeabiv with scnc array or microarrav. 
,\iihough not to r mail V defined, microarrav is 
generally used to describe the higher densir>* 
arrays r\*pica]ly printed on glass chips. The 
DNA elements that make up DN.A arravs 
can be oligonucleotides, partial sene 
sequences, or full-icn£:th cD.N.As. Companies 
offering prc-madc arravs that contain icss 
than mll-iength clones normally use regions 
ot the genes which are specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA clone identirv* is ncccssarv* because of 
errors in identiri ing specific clones from 
cDNA libraries and databases. Premade 
DNA arrax-s printed on membranes are cur- 
rently or imminently available for human, 
mouse, and rat. In most cases they contain 
DNA sequences representing several thou- 
sand different sequence clusters or genes as 
delineated through the National Center for 
Biotechnology Information UniCene Projea 
(2). Many of these different UniCene dusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arra)^ are t\^pically printed on one of two 
tx'pes of support matrix. Nylon membranes 
are used by most otf-the-shelf array providers 
such as Clontech Laboratories. Inc. 
(Palo Alto, C\), Genome Systems, Inc. (St. 
Louis, MO), and Research Genetics. Inc. 
i'Hunt$\'iIlc. .ALL .Microarravs such as chose 
produced by AmTnetru. Inc. t Santa Clara. 
C\i. Inci-cc Pharmaccuucals. Inc. (Palo Alto, 
C\). and many do-it-yourself (DIY) arraying 
groups use glass wafers or slides. Although 
standard microscope slides may be used, chey 
must be preprepared co facilitate sticking 
of the DNA to the glass. Several di^erent 
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coacines have been succcssruJIy mcd. includ- 
ing silanc and lysine. The coating of slides 
can easilv be carried oui in the laboratorv. 
bur many prefer the convenience of prccoaccc 
slides available from suppiien. 

Once the support matrix has been pre- 
pared, the DNA elements can be applied by 
scvcraJ methods. Affymeuix, Inc. has devel- 
oped a unique photolithographic technology* 
for aruching oligonucleotides to glas^ wafers. 
More commonly, DNA is applied by either 
noncontact or contaa printing. Nonconuct 
printers can use thermal, solenoid, or piaociec- 
tric technoiog>' to spray aliquots of solution 
onto the suppon matrix and may be used to 
produce slide or membrane-based arrays. 
Canesian Technologies, Inc (Irvine. CA) has 
developed nQUAD technolog\* for use in its 
PixSys printers. The system couples a syringe 
pump with the microsolenoid valNr. a combi- 
nation that provides rapid quantitative dispens- 
ing of nanoliter volumes (down to 4.2 nL) over 
a variable volume range. A different approach 
to noncontaa printing uses a solid pin and ring 
combination (Genetic MicroSystems, Inc., 
Wobum* MA). This system (Figure 1 j allows a 
broader range of sample, including cell suspen- 
sions and particulates, because the printing 
had cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposiuon, although the 
nature of the suppon matrix and the sample 
will also afrca transfer to some degree. 

In contaa prinring, die pin head is dipped 
in the sample and then touched to the suppon 
matrbc to deposit a small aliquot. Split pins 
were one of the first contaa-printing devices 
to be reported and are the suggested fornrut 
for DIY arrayers. as described by Brown (5). 
Split pins arc small metal pins with a precise 
srroove cut vertically in the middle of the pin 
rip. In this system. 1-48 split pim are posi- 
tioned in the pin-hraH Tne split oins work bv 
simple capilian* acdon, not unlike a rountain 
pen — when the pin heads are dipped in the 
sample, liquid is drawn into the pin groove. A 
small (fixed) volume is then deposited each 
time the split pins are gently touched to 
the support matrix. Sample (100-500 pL 
depending on a variety of parameters) can be 
deposited on muJdple slides before refilling is 
required, and array densities of > 2,300 
spots/cm- may be produced. The deposit vol- 
ume depends on the split size, sample fluidi- 
t>', and the speed of prinring. Split pins arc 
relatively simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
directly from companies such as TclcChcm 
Inicmadonal. Inc (Sunnyvale, CA). 

Irrespective of their source, printers 
should be run through a preprint sequence 
prior to producing the actual experimental 



arra>-s: the first ICQ or so spots of a ncu- run 
rend to be $omeu*har variable. Factors efrea- 
ing spot rcproducibiiiri' include slide treat- 
ment homogcneit}-. sample difTcrcnccs. and 
instrument errors. Other factors that come 
into play include clean ejection of the drop 
and clogging InQL'AD printing) and 
mechanical variations and long-term alter- 
auon in print-head surface of solid and split 
pins- However, with careful preparation it is 
possible to get a coefficient of variance for 
spot reproducibility below I0<^-o. 

One potential prinring problem is sample 
carryover. Repeated washing, blotting, and 
drying (vacuum) of pnnt pins bcftveen samples 
is normally efiixiivc at reducing sample canv- 
ovcr to negligible amounts. Prinring should 
also be carried out in a controlled environ- 
ment. Humidified chambers are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is imponant in determining 
spot size. qualir>\ and rcproducibilit>\ 

In summary, although several printing 
technologies are available, none arc par- 
ticularly outstanding and the bottom line 
is that the\' arc still in a relatively early stage 
of evolution. 

Array Hybridization 

The hybridiiarion protocol is. practically 
speaking, relatively straightforward and chose 
with previous experience in blotting should 
have little difficulty. Array hybridizations 
arc, in essence, reverse Southem/Nonhern 
blots — instead of applying a labeled probe to 
the target population of DNA/RNA. the 
labeled popularion is applied to the probets). 
With membiane-based arrays, the control and 
treated mRNA popubtions are normallv con- 
verted to cDNA and labeled with isotope (e.g., 
^-P) in die process. These labeled pooulauons 
are tnen nyoriaized indeDendesuv to parallel 
or scnai arrays and the hybridizanon sicnai is 
detected with a phosporimagcr. A less com- 
monly used altenurive to radioaaivc probes is 
enzymatic detection. The probe may be 
biorinylaied. haptenylated»j?r have alkaiiDe 
phosphatase/horseradish peroxidase attached. 
Hybridizaoon is dcieaed by enzymaric reac- 
tion yielding a color reaaion (4. Diffeences 
in hybridization signals can be deteaed by eye 
or. more accurately, with the help of digital 
imaging and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slighriy different 
approach. The probe typically consists of two 
samples of polyA* RNA (usualiy &om a created 
and a conaol population) chat are convened xo 
cDNA; in the process each is labeled with a 
different fluor. The independently labeled 
probes arc ihcn nuxcd toge«hcf and hybridized 
to a single microarray slide and the roulting 
combined fluorescent signal is scanned. After 
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Figure 1. Genetic Microsystems (Woburn. MA) pin 
ring system for priming arrays. The pin nng com- 
bination consists of a circular open ring oriented 
parallel to the samole solution, with a venical pin 
centered over the ring. When the ring is dipped 
into a solution and lifted, it withdraws an aliquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and a portion of the solution is transferred to the 
bonom of the pin. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the underlying surface. The 
pin is then lifted, and gravity and surface tension 
cause deposition of the spot onto the array. 
Figure from Flowers ei al. {14). with permission 
from Genetic Microsystems. 

normalization, it is possible to determine the 
ratio of fluorescent signals from a single 
h^-bridization of a slide-based microarrav. 

cDNA derived from control and treated 
populations of RNA is most commonly 
hybridized to arrays, although subtraaive 
hybridization or differential display rcacnoiu 
mav also be used. Fluorophore- or radiola- 
bcied nucleouoes are direcuy incorporated 
mto the cDNA in the process of converting 
RiNA to cDNA Alternatively, 5' end-labeled 
primers may be used for cDNA synthesis. 
These are labeled with a fluorophore for 
direct visualization of the hybridized array. 
Alternatively, biotin or a hapten may be 
attached to the primer, in which fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most commonly used fluorophores at present 
are cyanine (Cy)3 and Cy5 (Amersham 
Pharmacia Biotech AB. Uppsala, Sweden). 
However, the relative ocpense of these fluey 
descent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodamine, 
and Texas red have all been used, and 
companies such as Molecular Probes. Inc. 
(Eugene, OR) are developing a series of 
labeled nucleotides with a wide range of cxd- 
ution and emission specira which may prove 
to function as well as the Cy dyes. 
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Table 1. Advantages and disadvantages of differem microarray scannmg systems 
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Analysis of DNA Microarrays 

jMembranc-based arrays arc nomully ajulvTcd 
on film or with a phosphorima§cr. whereas 
chip-based arrays require more specialized scan- 
ning devices. These an be divided into three 
main eTt)ups: the char^-coupled device camera 
systems, the nonconibcaJ laser scanners, and the 
con/ocaJ laser scanners. The ad\-ancagcs and dis- 
advantages of each svTtem are listed in Table 1 . 

Because a t\pical spot on a microarray can 
contain > 10^ molecules, ic is dear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
nnany orders of magnitude (4 or 5 is more typ- 
ical) . Howe\'er. the scanning parameters can 
normally be adjusted to collect more or less 
signal, such that two or three scans of the same 
array should permit the dctcaion of rare and 
abundant genes. 

When a microarray is scanned, the fluores- 
cent images arc captured by sorrwarc normally 
included with the scanner. Sc^'cral commercial 
suppliers pro\'ide additional software for quan- 
tih'ing array images, but the software tools arc 
constantly evolving to meet the developing 
needs of researchers, and it is prudent to 
define one's own needs and clarif)- the exact 
capabilities of the softv.-arc before its purchase. 
Issues that should be considered include the 
following: 

• Can the software locate offset spots? 

• Can it quantitate across irregular h>'bridi2a- 
tion signals? 

• Can the arrayed genes be programmed in for 
eas\' idendiicauon and location.^ 

• Can the software connea via the Internet to 
databases containing further information on 
the genefs) of interest? 

One of the ke>- issues raised at the work- 
shop was the sensitivity of microarray tcchnol- 
OS)*. Experiments by General Scanning, Inc. 
^'atenuwn. MA), have shown thai by using 
the Cy dves and their scanner, signal be 
deteaed down to Icveb of < I fluor molecule 
per square micrometer, which translates to 
detecting a rare message at approxixnatdy one 
copy per cell or less. 

Array Applications 

.^though arrays are an emerging technology^ 
certain to undergo improvement and 
alteradon* they have already been applied use- 
fully to a number of model systems. Arrays are 
at their most powcrftil when chey contain the 
entire genome of the spedcs they are being 
used to study. For this icason* d)cy have strong 
suppon among researchen utilizing yeast and 
Cdenorhabditis elegam {S). The genomes of 
both of diese spedcs have been sequenced and, 
in the case of yeast, deposited onto arrays for 
examination of gene expression {6,7j, With 
both of these species, it is rclativdy easy to 
perturb individual gene otpression. Indeed. C 



CCD, cnarge-coupiea aevici. 
From Kawasaki ( 7J). 

cUgans knockouts can he made simply by 
soaking the worms in an antisensc solution of 
die gene to be knocked out. 

By a process of systematic gene disrup- 
tion, it is now possible to examine the cause 
and effect relationships berween different 
genes in these simple organisnu. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolute polygenic interactions, and 
define the architecture of the cellular networL 
A simple case study of how this can be 
achieved was presented by Butow [Universit>- 
of Texas Southwestern Medical Center. 
Dallas. TX (Figure 2)]. Although it is the 
phcnotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost always be polygenic. 
Polygenic interactions will become increasing- 
ly important as researchers begin to move 
away from single gene systenu when examin- 
ing the nature of toxicologic responses to 
external stimuli. This is especially important 
in toxicoiog)' because the phenor>pc pro- 
duced by a given environmental insult is 
never the result of the aaion of a single gene; 
rather, it is a complex interaction of one or 
multiple cellular pathways. Phenomena such 
as quantirattve trait (the continuous ^-ananon 
of phenotypeK episrasis tine enecr of alleies of 
one or more genes on the ccpresston of other 
genes), and penetrance iproponion of indi- 
viduals of a given genotype that display a par- 
ticular phcnotype) will become increasingly 
evident and important as toxicologists push 
toward the ultimate goal of matching the 
responses of individuals to different 
environmental stimuli. 

Analysis of the transcriptome (the expres- 
sion level of all the genes in a given cell popula- 
Qon) was a use of arrays addressed by several 
speakers. Unibmmately, current gene nomen- 
dacure b often confusing in that single geno 
are allocated muldple names (usually as a result 
of independent discovery by di fferen t laborato- 
rio), and there was a call br standardizauon of 
gene nomenclature. Nevertheless, once a tran- 
scriptome has been assembled it can then be 
transferred onto arrays and used to screen any 
chosen system. The EPA MicroArray 
Consoitium (EPAMAQ is assembling teste 



transcriptomes for human, rat. and mouse. In a 
slighdy diftercnt approach. NuN^-ax-sir ct ai. 
describes how the NIEHS assembled what is 
effcctivciy a ' toxicoiogical transcriptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Clontech 
Laboratories. Inc. (Palo Mio. C\). has begun a 
similar process by developing stress/toxicolo^ 
filter arrays of rat, mouse, and human genes. 
Thus, rather than being tissue or cell specific 
these stress/ toxjcoiogj- arra\'s can be used across 
a variety of model systems to look for alter- 
ations in the expression of toxicologicallv 
imponant genes and dehnc the new field of 
toxicogcnomia. The potential to identify toxi- 
cant fiunilies based on tissue- or cell-specific 
gene escprcssion could revoludonize drug test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/carcinogenicity of newly discovered 
compounds (Figure 3), but also aid in duddai- 
ing their mechanism of aaion through identifi- 
cation of gene expression networks. By extef>- 
sion. such signatures could provide easilv iden- 
tifiable biomarkers to assess the degree, time, 
and nature of exposure. 

DNA arra\'5 are primarily a tool for exam- 
ining dirtcrcntiai gene expression in a eivcn 
model. In this context the\' are rei'm e ti to as 
dosed systems because they lack the abilitv of 
other differencial expression technologies, eg., 
differential display and subtracttve hybridia- 
tion» to detea previously unknown genes not 
present on the array. This would appear to 
limit the power of DNA arrays to the imagina- 
Qons and preconcepttoru of the researcher in 
selecting genes previously charaaerized and 
thought to be involved in the model svstem. 
However, the various genome sequencing pro- 
jects have created a new category of 
sequence — the EST — that has partially molli- 
fied this deficiency. ESTs are cONAs otpressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly charaaerized genes, have not been assigned 
specific genetic idendty. By incorporaring EST 
dons into an array, it is possible to meniter 
the expression of these unknown genes. Thb 
can enable the identification of previously 
unchancterized genes that may have biologic 



sienifiuncc in [he model s>'5tcm. Filter iiT3\*s 
from Research Genetics and slide arrav-s rrom 
lnc\"tc Pharmaceuticals both incon>oratc iarse 
numbers or LSTs rrom a \-ancn* or species. 

A runner use or miaoarrays is the idenrin- 
cacion of single nucleotide polymorphisms 
(SNPsi. These genomic variatioru are abun- 
dant — ihcv occur approndnutdy every 1 kb or 
so— and arc the basis of restriction rragment 
icnsth poivmorphism anaiv-sis used in forensic 
analysis. .Ain-mctrix, Inc. desiened chips that 
contain multiple repeats of the same ecnc 
sequence. Each position is present y\-\th all four 
possible bases. Aticr the hybridization of the 
sample, the degree of h\'bridi2aiion to the dif- 
ferent sequences can be measured and the exaa 
sequence of the target gene deduced. SNTs are 
thought to be of vital importance in drug 
metabolism and toxicology*. For example, sin- 
gle base differences in the regulatory region or 
active sire of some genes can account for huec 
diiTcrences in the aaivit\' of that gene. Such 
SNPs are thought to explain why some people 
are able to metatx>lize certain xenobiotics bet- 
ter than others. Tlius. arrav-s provide a further 
tool for the toxicologist investigatinc the 
nature of susceptible subpopularions and toxi- 
cologic response. 

There are still many wrinkles to be ironed 
out before array's become a standard tool for 
toxicologists. The main issues raised at the 
workshop by those with hands-on experience 
were the following: 

• Experisc: the cost of purchasing/con traaing 
this technology is stili too great for many 
individual laboratories. 
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Figure 2. Potemial effectt of gene knockout within 
posrtfvety and negatjvety regulated gene expreuion 
networks, is limiting in wiltJ type for expression of 
^. 141 A simple, two-component linear regulatory 
network operating on gene i^, wtiere /, is a positive 
effector of ^ and is either a positive or negative 
effector of This network could be deduced by 
examining the consequence of IB) deleting on the 
expression of /, and where the expression of L 
would be decreased or increased depending on 
whether was a positive or neganve regulator. 
These and other connected components of even 
greater compJexiiv coufd.be revealed by genome- 
wide expression analysis. From Butow < /5). 



• Clones: the logistics of identining. obtaining, 
and maintaining a set of nonredundant. non- 
contaminated, sequence-verified, species/ ceil ' 
tissue/ ncld-spccinc clones. 

• Use of inbred smins: i^-hcre whole-organism 
models are being used, the use of mbred 
strains is important to reduce the potentuliv 
confusing cfrecis of the individual \-ariation 
topically seen ir> oucbred populations. 

• Probe: the need for rcbri\'cly large amounts 
ot RNA. which limits the r\*pc of sample 
(e.g., biopsy) that can be used. Also, different 
RNA extraction methods can give difierent 
results. 

• Specificit)': the abiiit>' to discriminate accu- 
rately between doscly related genes ie.g.. the 
c\iochromc p*t50 family) and splice variants. 

• Quantitation: the quantitation of gene 
expression using gene arra)^ is still open to 
debate. One reason for this is the different 
incorporation of the bbcline dves. However, 
the main difficulrv lies in knowing what to 
normalize against. One opdon is to include a 
large number of so-called housekeeping genes 
in the array. Howc\'er. the expression of these 
genes often change depending on the tissue 
and the toxicant, so it is necessar>' to charac- 
terize the expression of these genes in the 
model system bctbrc utilizing them. This is 
clearly not a viable option when screening 
multiple new compounds. A second oprion 
is to include on the amy genes from a nonre- 
lated speacs (e.g., a plant gene on an animal 
array) and to spike the probe with s>-nthctic 
RNA(s) complementart' to the genc(s). 

• Rcproducibilit^i this is sometimes question- 
able, and a figure of approximately rwb or 
three repeats W2s used as the minimum num- 
ber required to confirm initial findings. 



.Agajn. howc\*c:. most pcr'op.;- 
■ use or Northern biots or rr\-crsc ::in^::r:2^:r 
PCR to conrirm rindmjn 

• Scnsitivlr\': concerns wc:r voicrJ jboi:: :r.:- 
"number of targe: moiecjics thai must be pre- 
sent in 3 sample ror rherr: ro txr oercciec on 
the array. 

• Efncicha- reproducible idenrirlcauon of 1.^- 
to 2-to!d dincrences in exprcision \^-as repon- 
ed. although the number of genes that 
undergo this level of change and remain 
undetected is open to debate. It is imponant 
that this level of detection be ultimatelv 
achieved because it is commonly perceived 
that some important transcription factors 
and their regulators respond at such iou icv- 
eis. In most cases. 3- to ^-rblJ was the mmi- 
mum change that most were happv to 
accept. 

• Biomformatics: perhaps the greatest concern 
was how to accurately interpret the data with 
the greatest accuracy and crncieno*. The 
biggest headache is tn ing to idcntih* net- 
worb of gene expression that arc common to 
different treatments or doses. The amount of 
data from j single experiment is huge. It may 
be that, in the future. sc\'eral groups individ- 
uallv equipped with specialized software algo- 
rithms tor studying their favorite genes or 
gene s>-stems will be able to share die same 
hx bridized chips. Thus, arrays could usher in 
a new perspective on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists arc 
unable to use array technologv- is the high cost 
involved, whether buying ofF-dic-shclf mem- 
branes, using contract printing services, or 
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Rgwe 1 Gene expression profiles— also called fingerprints or signatures— of known toxicams or toxi- 
cant families may, in the future, be used to identify the 

pie. the genetic signature of test compound 1 is identical lo'tharof itnovw pero^some ^ 

whereas tt»ai of test compound 2 does not match any known toxicant family. Besed on these rwuhs. ten 

compound 2 would be retained for further testing and test compound 1 would be elimmated 
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producing chips in-housc. In view or this, 
researchers at cnc RTD/NHEERL initiaccd 
rhc EPAMAC. This consortium bnnss 
toecchcr scicnusis rrom the EPA and a num- 
ber of cxTramuraJ labs with the aim of dcvcl- 
opins microarrav capabilir>' through the shar- 
ing of resources and data. EPAMAC 
researchers arc primarily interested in the 
devclopmcntiJ and toxicologic changes seen 
in testicular and breast tissue, and a ponion 
of the workshop was set aside for EP.A-\L\C 
members to share their ideas on how the 
experimental application of microarrays could 
facilitate their research. One of the central 
areas of interest to EP.AM^C members is the 
effect of xenobiotics on male fertilit}- and 
reproductive health. Of greatest concern is 
the effect of exposure dunng critical periods 
of dex'clopmcnt and germ cell differentiation 
(9). and how this may compromise sperm 
counts and quality following sexual matura- 
tion iJO)- As well as spermatogcnic tissue, 
there is also interest in how residual mRNA 
found in mature sperm ill) could be used as 
an indicator of prev'ious xenobiotic effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsy). .Ajravs will be used to examine 
and compare the effect of exposure to heat 
and chemicals in testicular and epididymal 
gene expression profiles, with the aim of 
establishing relationships/associations 
between changes in de\'elopmental landmarks 
and the effects on sperm count and qualir\'. 
Cluster, pattern, and other analysis of such 
data should help identifi* hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles for 
genes with unknown funaions. 

Summacy 

The full impact of DNA arrays may not be 
-cen for sc\'eni van. but the interest shown at 
:ms repoiui workshop induaics the high Ic^'d 
or interest thai xhcy tbsier. .\p2n trom educat- 
ing and advertising the ^-arious technologies in 
this field, this workshop brought together a 
number of researchers from the Research 
Triangle Park area who are already using DNA 
ajravs. The interest in sharing ideas and acperi- 
ences led to the initiation of a Thangle amy 
user's group. 
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.Array technolo^i- is still in its inttnc>*. This 
mciru that the hardware is still improving and 
there is no current consensus ror standard pro- 
cedures, quantitation, and interprcuxion. 
Consisteno- in spotting and scanning arra\'5 is • 
not yet optimized, and this is one of the most 
critical requirements of any cxperimenL ln_- 
addition. one of the dark regions of array tech- 
nology — strife in the couns over who owns 
what portions of it — has further muddled the 
future and is a potential barrier to^^*ard the 
development of conscruus procedures. 

Perhaps the greatest hurdle for the applica- 
tion of array's is the actual interpretation of 
dau. No specialists in bio informatics anended 
the workshop, largely because the>' are rare and 
because as yet no one seems dear on die best 
method of approaching data analysis and inter- 
pretation. Cross-referencing results from mul- 
tiple experiments (time, dose, repeats, different 
animals, different species) to identif>' common- 
ly expressed genes is a great challenge. In most 
cases, we are still a long way from understand- 
ing how the expression of gene X is related to 
the expression of gene }' and ordering gene 
expression to delineate causal relationships. 

To the ordinar>' scientist in the typical lab- 
oratory, however, the most immediate prob- 
lem is a lack of affordable instrumentation. 
One can purchase prcmadc membranes at 
relatively affordable prices. .Although these 
may be useful in identify-ing individual genes 
to pursue in more detail using other methods, 
the numbers thai would be required for e\'en a 
small routine toxicologv* experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogisi. there is a need to carry out multiple 
experiments — dose responses, time curv'es, 
multiple animals, and repeats. Glass-based 
DNA arrays arc most actraaive in this context 
because the>' can be prepared in large batches 
from the same DNA source and accommo- 
date control and treated samples on the same 
chip. .•Another problem vnih current off-the- 
shelf arrays is that the\' often do not contain 
one or more of the particular genes a group is 
interested in. One alternative is to obtain 
and/or produce a set of custom clones and 
have contract printing of membranes or slides 
carried out by a company such as Genomic 
Solutions, Inc (Ann Axix)r. MI). This approach 
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is less expensive than ij\ip.^ ou: jj?::-. : - 
one s own entire svstcrr, J::houi:r. j: sorr.; 
point it might make econorr.;: ier.>c :j n::r.: 
one s own arravs. 

Finallv. DN.A arrax-s are currcntiv j team 
effort. Tne\- are a lechnoiop tha: uses j ui 
. jaxigc of skills including engineering, sutisucs. 
molecular biotog^ . chemistr\-. and bioinfor- 
matics. Because most indjviduals are skilled in 
only one or perhaps rw-o of these areas, it 
appears that success with arravs may be best 
expected by teams of collaborators consisting 
of individuals having each of these skills. 

Those considering array applications mav 
be amused or goaded on bv the roliowmc 
quote trom /iffrru/if magazine i / J): 

.Microprocessorj have reshaped our cconomx . 
spaw-ncd vast rormncs 2nd changed che ^fc•av »»c live. 
Gene chips could be r\ en bigger. 

Although this comment may have been 
designed to excite the imagination rather than 
accurately reflect the truth, it is fair to say that 
the age of functional genomics is upon us. 
DNA arra>'s look set to be an important tool in 
this new age of biotechnolog}* and will likely 
contribute answers to some of toxicologj' s 
most fundamental questions. 
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Subject: RE: [Fwd: Toxicolog> Chip] 
Dale: Mon. 3 Jul 2000 08:09:45 -0400 
From: "Afshan.Cynihia" <afshan(i'niehs.nih.gov> 
To: "'Diana Hamlei-Cox"* <dianahc@'incyie.com> 



Vou car. see zhe list of clones that we have on our 12:-' chip a* 
r.ttp: riar.ue. . r.iehs . r.ih ■ gov "aps-guesc ' clor.esrcr. . cf r. 

v;e selected a subset of genes {2000K) chat we believed critical tr tc:-: 
response and basic cellular processes and added a set of clones and 1ST s z r 
this. VJe have included a set of control genes (80-) that were selected r^' 
the NHGRI because they did not change across a large set of array 
experiT.ents . However, we have found that some of these genes change 
signficantly after tox treatments and are in the process of looi^mg at "he 
variation of each of these 80* genes across our experiments. 
Our chips are constantly changing and being updated and we hope that our 
data will lead us to what the toxchip should really be. 
I hope this answers your question. 
Cindy Afshari 



> From: Diana Hamlet-Cox 

> Sent: Monday, June 26, 2000 8:52 PM 

> To: afshari&niehs .nih , gov 

> Suhjecz: [Fwd: Toxicology Chip] 
> 

> Dear Dr. Afshari, 
> 

> Since I have not yet had a response from Bill Grigg, perhaps he was not 

> the right person to contact, 
> 

> Can you help me in this matter? I don't need to know the sequences . 

> necessarily, but I would like very much to know what types of sequences 

> are being used, e.g., GPCRs (more specific?) , ion channels, etc. 
> 

> Diana Hamlet-Cox 
> 

> Original Message 

> Subject; Toxicology Chip 

> Date: Mon, 19 Jun 2000 18:31:48 -0700 

> From: Diana Hamlet-Cox <dianahc0incyte , com> 

> Organization: Incyte Pharmaceuticals 

> To: griggQniehs .nih.gov 
> 

> Dear Colleague: 
> 

> I am doing literature research on the use of expressed genes as 

> pharmacotoxicology markers, and found the Press Release dated February 

> 29, 2000 regarding the work of the NIEHS in this area. I would like to 

> know if there is a resource I can access (or you could provide?) that 

> would give me a list of the 12,000 genes that are on your Human ToxChip 

> Microarray. In particular, I am interested in the criteria used to 

> select sequences for the ToxChip, including any control sequences 

> included m the microarray, 
> 

> Thank you for your assistance in this request. 
> 

> Diana Hamlet-Cox, Ph.D, 

> Incyte Genomics, Inc. 
> 

> — 



1 or: 
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jFvKd To\iwoloc> Lhipi 



> This ai-ail message ror the so^e use of zhe in: en dec rer.cier. : s a: 

> may cor.zair. czr.fider.zlal and privileged infozTzazior. SL*bjecr rr 

> azzorney- ci ier.z privilege , Any* ima'jzhcrized review, use. disclos'jre ci 

> diszribuzion is prohibized . If you are noz zhe inzended reripierz . 

> please conzarz zhe sender by reply er^ail and deszroy all copies cf zhz 

> original message. 

> 
> 
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